This talk is a general discussion of the current state of Open MPI, and a deep dive on two new features:
1. The flexible process affinity system (I presented many of these slides at the Madrid EuroMPI'13 conference in September 2013).
2. The MPI-3 "MPI_T" tools interface.
I originally gave this talk at Lawrence Berkeley Labs on Thursday, November 7, 2013.
Open MPI Explorations in Process Affinity (EuroMPI'13 presentation)Jeff Squyres
The document discusses Open MPI's Locality-Aware Mapping Algorithm (LAMA) interface for controlling process placement on parallel machines. It describes how LAMA allows users to specify regular mapping patterns of processes to resources. It also outlines the three main steps in MPI process placement with LAMA: 1) mapping processes to resources, 2) ordering processes, and 3) binding processes during launch according to the mapping. The goal is to provide a mechanism for exploring different process placements to minimize communication costs.
Open MPI State of the Union X SC'16 BOFJeff Squyres
This document summarizes updates from the Open MPI State of the Union X Community Meeting at SC'16. It discusses changes to Open MPI's GitHub repository and contribution policy, the versioning scheme and roadmap for future versions, and lesser known features of Open MPI including support for Singularity containers, the ORTE Distributed Virtual Machine, the OMPIO parallel I/O library, AWS scale testing, and Open MPI's involvement in the Exascale Computing Project.
Fun with Github webhooks: verifying Signed-off-byJeff Squyres
An overview of an afternoon project I noodled around with one day to play with Ruby and Github Webhooks. I surprised myself by creating something somewhat actually useful.
Presentation given to the Kentucky Open Source Society (KyOSS) on July 8, 2015.
Slides presented by Jeff Squyres at the 2015 OpenFabrics Software Developers' Workshop. This talk discusses Cisco's experiences implementing an ultra-low latency Ethernet plugin / provider for the Linux Verbs API and for for the Libfabric API.
This document summarizes updates from the Open MPI State of the Union community meeting at SC'15. Key points include: the new Open MPI versioning scheme uses A.B.C triples where A breaks backwards compatibility, B adds features, and C are bug fixes; v1.10.1 was recently released and adds some new features while maintaining backwards compatibility; v2.0 is targeted for Q1 2016 and will include many new features and be MPI-3.1 compliant; and integration with technologies like UCX and PMIx was discussed.
Spend some time working with OpenAPI and gRPC and you’ll notice that these two technologies have a lot in common. Both are open source efforts, both describe APIs, and both promise better experiences for API producers and consumers. So why do we need both? If we do, what value does each provide? What can each project learn from the other? We’ll bring the two together for a side-by-side comparison and pose answers to these and other questions about two API methodologies that will do much to influence the future of networked APIs.
Open MPI new version number scheme and roadmapJeff Squyres
The document discusses Open MPI's transition to a new version numbering scheme and release planning roadmap. Open MPI will move from an "odd/even" numbering scheme to a "A.B.C" scheme where A changes for backwards incompatible releases, B changes for new features, and C changes for bug fixes. Version 1.10.0 will start using the new scheme, with larger new features planned for version 2.0.0 later in the year. Future release series are planned to be supported for around 2 years each.
Talk Python To Me: Stream Processing in your favourite Language with Beam on ...Aljoscha Krettek
Flink is a great stream processor, Python is a great programming language, Apache Beam is a great programming model and portability layer. Using all three together is a great idea! We will demo and discuss writing Beam Python pipelines and running them on Flink. We will cover Beam's portability vision that led here, what you need to know about how Beam Python pipelines are executed on Flink, and where Beam's portability framework is headed next (hint: Python pipelines reading from non-Python connectors)
Open MPI Explorations in Process Affinity (EuroMPI'13 presentation)Jeff Squyres
The document discusses Open MPI's Locality-Aware Mapping Algorithm (LAMA) interface for controlling process placement on parallel machines. It describes how LAMA allows users to specify regular mapping patterns of processes to resources. It also outlines the three main steps in MPI process placement with LAMA: 1) mapping processes to resources, 2) ordering processes, and 3) binding processes during launch according to the mapping. The goal is to provide a mechanism for exploring different process placements to minimize communication costs.
Open MPI State of the Union X SC'16 BOFJeff Squyres
This document summarizes updates from the Open MPI State of the Union X Community Meeting at SC'16. It discusses changes to Open MPI's GitHub repository and contribution policy, the versioning scheme and roadmap for future versions, and lesser known features of Open MPI including support for Singularity containers, the ORTE Distributed Virtual Machine, the OMPIO parallel I/O library, AWS scale testing, and Open MPI's involvement in the Exascale Computing Project.
Fun with Github webhooks: verifying Signed-off-byJeff Squyres
An overview of an afternoon project I noodled around with one day to play with Ruby and Github Webhooks. I surprised myself by creating something somewhat actually useful.
Presentation given to the Kentucky Open Source Society (KyOSS) on July 8, 2015.
Slides presented by Jeff Squyres at the 2015 OpenFabrics Software Developers' Workshop. This talk discusses Cisco's experiences implementing an ultra-low latency Ethernet plugin / provider for the Linux Verbs API and for for the Libfabric API.
This document summarizes updates from the Open MPI State of the Union community meeting at SC'15. Key points include: the new Open MPI versioning scheme uses A.B.C triples where A breaks backwards compatibility, B adds features, and C are bug fixes; v1.10.1 was recently released and adds some new features while maintaining backwards compatibility; v2.0 is targeted for Q1 2016 and will include many new features and be MPI-3.1 compliant; and integration with technologies like UCX and PMIx was discussed.
Spend some time working with OpenAPI and gRPC and you’ll notice that these two technologies have a lot in common. Both are open source efforts, both describe APIs, and both promise better experiences for API producers and consumers. So why do we need both? If we do, what value does each provide? What can each project learn from the other? We’ll bring the two together for a side-by-side comparison and pose answers to these and other questions about two API methodologies that will do much to influence the future of networked APIs.
Open MPI new version number scheme and roadmapJeff Squyres
The document discusses Open MPI's transition to a new version numbering scheme and release planning roadmap. Open MPI will move from an "odd/even" numbering scheme to a "A.B.C" scheme where A changes for backwards incompatible releases, B changes for new features, and C changes for bug fixes. Version 1.10.0 will start using the new scheme, with larger new features planned for version 2.0.0 later in the year. Future release series are planned to be supported for around 2 years each.
Talk Python To Me: Stream Processing in your favourite Language with Beam on ...Aljoscha Krettek
Flink is a great stream processor, Python is a great programming language, Apache Beam is a great programming model and portability layer. Using all three together is a great idea! We will demo and discuss writing Beam Python pipelines and running them on Flink. We will cover Beam's portability vision that led here, what you need to know about how Beam Python pipelines are executed on Flink, and where Beam's portability framework is headed next (hint: Python pipelines reading from non-Python connectors)
Slides presented by Jeff Squyres at the 2015 OpenFabrics Software Developers' Workshop. This talk discusses the current state and future plans for the use of Libfabric in Open MPI.
The Lagopus Router is a modular, open-source software router developed by the Lagopus Project. It is written in C and Golang for high performance packet processing using DPDK. The router has a modular architecture that provides high extensibility for handling network protocols. Current supported protocols include L2, L3, tunneling and IPsec. The goal of the project is to develop an open-source, high performance software router that supports many protocols and can be customized for various network functions and services.
This document provides an overview of the NCTU P4 Workshop. It discusses:
- The P4 programming language which allows specifying how switches process packets in a protocol-independent and target-independent way.
- Key concepts in P4 including headers, parsers, tables, actions, and the control flow.
- An example P4 architecture and how to define headers, parsers, tables, and the control flow.
- How to get started with P4 including setting up the behavioral model compiler and runtime environment and using Mininet to test P4 programs.
- A quick demo of a simple P4 program that uses a custom header to implement path routing and can be configured via the runtime
zebra is an open source implementation as a successor of GNU Zebra and Quagga project. Together with openconfigd, it will work as data plane agnostic Network Operation Stack working with variable protocol / functional modules.
The webinar discussed accelerating P4 and eBPF programs on Netronome SmartNIC hardware. It covered the Linux kernel infrastructure like TC and XDP that supports offloading eBPF programs. It also explained how the NFP architecture is optimized for network flow processing with its multi-core design and memory hierarchy. The webinar demonstrated how eBPF programs can be translated to run efficiently on the NFP hardware by handling maps and applying optimizations.
BPF: Next Generation of Programmable DatapathThomas Graf
This session covers lessons learned while exploring BPF to provide a programmable datapath based on BPF and discusses options for OVS to leverage the technology.
The document discusses P4 support in ONOS. It provides an overview of the P4 language and P4Runtime framework, and then describes ONOS's PI framework for controlling P4 programmable switches. The PI framework models P4 pipelines and allows both pipeline-agnostic and pipeline-aware applications. It translates between ONOS abstractions and P4Runtime messages using a pipeline interpreter and driver behaviors defined in a pipeconf file. The document demonstrates how ONOS can deploy and program P4 pipelines using these components.
HKG15-301: OVS implemented via ODP & vendor SDKsLinaro
HKG15-301: OVS implemented via ODP & vendor SDKs
---------------------------------------------------
Speaker: Ciprian Barbu, Zoltan Kiss
Date: February 11, 2015
---------------------------------------------------
★ Session Summary ★
Comparison of OVS implemented via ODP & vendor SDKs. Contrasting ODP linux-generic with the native Intel DPDK SDK and ODP implemented using the DPDK SDK on X86. Additionally comparing ODP linux-generic with ODP implemented using the Texas Instruments SDK on A15 ARM
--------------------------------------------------
★ Resources ★
Pathable: https://hkg15.pathable.com/meetings/250805
Video: N/A
Etherpad: N/A
---------------------------------------------------
★ Event Details ★
Linaro Connect Hong Kong 2015 - #HKG15
February 9-13th, 2015
Regal Airport Hotel Hong Kong Airport
---------------------------------------------------
http://www.linaro.org
http://connect.linaro.org
Cilium - Fast IPv6 Container Networking with BPF and XDPThomas Graf
We present a new open source project which provides IPv6 networking for Linux Containers by generating programs for each individual container on the fly and then runs them as JITed BPF code in the kernel. By generating and compiling the code, the program is reduced to the minimally required feature set and then heavily optimised by the compiler as parameters become plain variables. The upcoming addition of the Express Data Plane (XDP) to the kernel will make this approach even more efficient as the programs will get invoked directly from the network driver.
Taking Security Groups to Ludicrous Speed with OVS (OpenStack Summit 2015)Thomas Graf
Open vSwitch (OVS) has long been a critical component of the Neutron's reference implementation, offering reliable and flexible virtual switching for cloud environments.
Being an early adopter of the OVS technology, Neutron's reference implementation made some compromises to stay within the early, stable featureset OVS exposed. In particular, Security Groups (SG) have been so far implemented by leveraging hybrid Linux Bridging and IPTables, which come at a significant performance overhead. However, thanks to recent developments and ongoing improvements within the OVS community, we are now able to implement feature-complete security groups directly within OVS.
In this talk we will summarize the existing Security Groups implementation in Neutron and compare its performance with the Open vSwitch-only approach. We hope this analysis will form the foundation of future improvements to the Neutron Open vSwitch reference design.
P4-based VNF and Micro-VNF Chaining for Servers With Intelligent Server AdaptersOpen-NFP
Commodity servers equipped with intelligent server adapters (ISAs) are being used as platforms for Network Functions Virtualization (NFV). The network traffic processing required by a specific use case is frequently expressed by forming a chain of Virtual Network Functions (VNFs). This demonstration illustrates that VNFs in the chain can be hosted on the server CPU or on the ISA. It furthermore illustrates that VNFs can be decomposed into components called Micro-VNFs, with the components again being hosted on the server CPU and/or the ISA. A P4 program (compiled to native code running on the ISA) defines the overall semantics of the datapath within an ISA equipped server and expresses how VNFs and Micro-VNFs should be composed within this platform. We show how mechanisms like tunnels and service headers programmed using P4 are employed to establish the VNF service chain across multiple network nodes.
David George
Lead Engineer, Netronome
David George is a lead engineer on the Netronome SDK team and is primarily responsible for Netronome's P4 data plane. He has previously been worked on the SDK simulator and x86 data plane components. He holds a Masters of Electrical Engineering from the University of Cape Town.
This work presents a P4 compiler backend targeting XDP, the eXpress Data Path. P4 is a domain-specific language describing how packets are processed by the data plane of a programmable network elements. XDP is designed for users who want programmability as well as performance.
https://github.com/williamtu/p4c-xdp/
SFO15-102:ODP Project Update
Speaker: Bill Fischofer
Date: September 21, 2015
★ Session Description ★
The OpenDataPlane project is now two years old and is beginning to see widespread interest on the part of both application writers and platform providers. This talk will discuss recent developments in ODP and its uses and look at what lies ahead for this fast-growing open source project.
★ Resources ★
Video: https://www.youtube.com/watch?v=QxK3waNaVEQ
Presentation: http://www.slideshare.net/linaroorg/sfo15102odp-project-update
Etherpad: pad.linaro.org/p/sfo15-102
Pathable: https://sfo15.pathable.com/meetings/302651
★ Event Details ★
Linaro Connect San Francisco 2015 - #SFO15
September 21-25, 2015
Hyatt Regency Hotel
http://www.linaro.org
http://connect.linaro.org
Linux Native, HTTP Aware Network SecurityThomas Graf
Cilium is open source software for transparently securing the network connectivity between application services deployed using Linux container management platforms like Docker and Kubernetes.
At the foundation of Cilium is a new Linux kernel technology called BPF, which enables the dynamic insertion of powerful security visibility and control logic within Linux itself. Because BPF runs inside the Linux kernel itself, Cilium security policies can be applied and updated without any changes to the application code or container configuration.
HKG15-110: ODP Project Update
---------------------------------------------------
Speaker: Bill Fischofer
Date: February 9, 2015
---------------------------------------------------
★ Session Summary ★
This session provides a summary of ODP activities since LCU ‰Û÷14 and highlights the main features of ODP v1.0 for applications as well as the validations used by conforming ODP implementation.
--------------------------------------------------
★ Resources ★
Pathable: https://hkg15.pathable.com/meetings/250771
Video: https://www.youtube.com/watch?v=xABcGPOCOuU
Etherpad: http://pad.linaro.org/p/hkg15-110
---------------------------------------------------
★ Event Details ★
Linaro Connect Hong Kong 2015 - #HKG15
February 9-13th, 2015
Regal Airport Hotel Hong Kong Airport
---------------------------------------------------
http://www.linaro.org
http://connect.linaro.org
The Paxos protocol is the foundation for building many fault-tolerant distributed systems and services. Given the importance of Paxos, and performance improvements to the protocol would have a significant impact on data-center infrastructure. We argue that implementing Paxos in network devices would significantly improve its performance. This talk describes an implementation of Paxos in P4, as well as our on-going efforts to evaluate the implementation on Netronome intelligent server adapters. Implementing Paxos provides a critical use case for P4, and will help drive the requirements for data plane languages in general. In the long term, we imagine that consensus could someday be offered as a network service.
Huynh Tu Dang
Italian University of Switzerland
Huynh Tu Dang is a second-year Ph.D. student in the Faculty of Informatics at Università della Svizzera Italiana. His research focuses on fault-tolerant distributed systems and application of software-defined networking (SDN). Previously, he worked as a research assistant at Ho Chi Minh International University and was an intern at INRIA, Nice-Sophia Antipolis on BtrPlace project. He received his Bachelor's degree from the School of Computer Science and Engineering at Ho Chi Minh International University.
LCU14 310- Cisco ODP
---------------------------------------------------
Speaker: Robbie King
Date: September 17, 2014
---------------------------------------------------
★ Session Summary ★
Cisco to present their experience using ODP to provide portable accelerated access to crypto functions on various SoCs.
---------------------------------------------------
★ Resources ★
Zerista: http://lcu14.zerista.com/event/member/137757
Google Event: https://plus.google.com/u/0/events/ckmld1hll5jjijq11frbqmptet8
Video: https://www.youtube.com/watch?v=eFlTmslVK-Y&list=UUIVqQKxCyQLJS6xvSmfndLA
Etherpad: http://pad.linaro.org/p/lcu14-310
---------------------------------------------------
★ Event Details ★
Linaro Connect USA - #LCU14
September 15-19th, 2014
Hyatt Regency San Francisco Airport
---------------------------------------------------
http://www.linaro.org
http://connect.linaro.org
The Next Generation Firewall for Red Hat Enterprise Linux 7 RCThomas Graf
FirewallD provides firewall management as a service in RHEL 7, abstracting policy definition and handling configuration. The kernel includes new filtering capabilities like connection tracking targets and extended accounting. Nftables, a new packet filtering subsystem to eventually replace iptables, uses a state machine-based approach with unified nft user interface.
Debugging is an essential part of Linux kernel development. In
user-space we have the support of the kernel and many debugging tools, tracking down a kernel bug, instead, can be very difficult if you don't know the proper methodologies. This talk will cover some techniques to understand how the kernel works, hunt down and fix kernel bugs in order to become a better kernel developer.
The document discusses ONNC, a compiler for deep learning formats like ONNX. It aims to connect ONNX to various deep learning accelerator (DLA) chips to help vendors bring products to market faster. Key features include supporting DLA features, optimizing memory usage and execution time, and being released as open source before the end of July 2018.
Fast and Reliable Apache Spark SQL EngineDatabricks
Building the next generation Spark SQL engine at speed poses new challenges to both automation and testing. At Databricks, we are implementing a new testing framework for assessing the quality and performance of new developments as they produced. Having more than 1,200 worldwide contributors, Apache Spark follows a rapid pace of development. At this scale, new testing tooling such as random query and data generation, fault injection, longevity stress, and scalability tests are essential to guarantee a reliable and performance Spark later in production. By applying such techniques, we will demonstrate the effectiveness of our testing infrastructure by drilling-down into cases where correctness and performance regressions have been found early. In addition, showing how they have been root-caused and fixed to prevent regressions in production and boosting the continuous delivery of new features.
Slides presented by Jeff Squyres at the 2015 OpenFabrics Software Developers' Workshop. This talk discusses the current state and future plans for the use of Libfabric in Open MPI.
The Lagopus Router is a modular, open-source software router developed by the Lagopus Project. It is written in C and Golang for high performance packet processing using DPDK. The router has a modular architecture that provides high extensibility for handling network protocols. Current supported protocols include L2, L3, tunneling and IPsec. The goal of the project is to develop an open-source, high performance software router that supports many protocols and can be customized for various network functions and services.
This document provides an overview of the NCTU P4 Workshop. It discusses:
- The P4 programming language which allows specifying how switches process packets in a protocol-independent and target-independent way.
- Key concepts in P4 including headers, parsers, tables, actions, and the control flow.
- An example P4 architecture and how to define headers, parsers, tables, and the control flow.
- How to get started with P4 including setting up the behavioral model compiler and runtime environment and using Mininet to test P4 programs.
- A quick demo of a simple P4 program that uses a custom header to implement path routing and can be configured via the runtime
zebra is an open source implementation as a successor of GNU Zebra and Quagga project. Together with openconfigd, it will work as data plane agnostic Network Operation Stack working with variable protocol / functional modules.
The webinar discussed accelerating P4 and eBPF programs on Netronome SmartNIC hardware. It covered the Linux kernel infrastructure like TC and XDP that supports offloading eBPF programs. It also explained how the NFP architecture is optimized for network flow processing with its multi-core design and memory hierarchy. The webinar demonstrated how eBPF programs can be translated to run efficiently on the NFP hardware by handling maps and applying optimizations.
BPF: Next Generation of Programmable DatapathThomas Graf
This session covers lessons learned while exploring BPF to provide a programmable datapath based on BPF and discusses options for OVS to leverage the technology.
The document discusses P4 support in ONOS. It provides an overview of the P4 language and P4Runtime framework, and then describes ONOS's PI framework for controlling P4 programmable switches. The PI framework models P4 pipelines and allows both pipeline-agnostic and pipeline-aware applications. It translates between ONOS abstractions and P4Runtime messages using a pipeline interpreter and driver behaviors defined in a pipeconf file. The document demonstrates how ONOS can deploy and program P4 pipelines using these components.
HKG15-301: OVS implemented via ODP & vendor SDKsLinaro
HKG15-301: OVS implemented via ODP & vendor SDKs
---------------------------------------------------
Speaker: Ciprian Barbu, Zoltan Kiss
Date: February 11, 2015
---------------------------------------------------
★ Session Summary ★
Comparison of OVS implemented via ODP & vendor SDKs. Contrasting ODP linux-generic with the native Intel DPDK SDK and ODP implemented using the DPDK SDK on X86. Additionally comparing ODP linux-generic with ODP implemented using the Texas Instruments SDK on A15 ARM
--------------------------------------------------
★ Resources ★
Pathable: https://hkg15.pathable.com/meetings/250805
Video: N/A
Etherpad: N/A
---------------------------------------------------
★ Event Details ★
Linaro Connect Hong Kong 2015 - #HKG15
February 9-13th, 2015
Regal Airport Hotel Hong Kong Airport
---------------------------------------------------
http://www.linaro.org
http://connect.linaro.org
Cilium - Fast IPv6 Container Networking with BPF and XDPThomas Graf
We present a new open source project which provides IPv6 networking for Linux Containers by generating programs for each individual container on the fly and then runs them as JITed BPF code in the kernel. By generating and compiling the code, the program is reduced to the minimally required feature set and then heavily optimised by the compiler as parameters become plain variables. The upcoming addition of the Express Data Plane (XDP) to the kernel will make this approach even more efficient as the programs will get invoked directly from the network driver.
Taking Security Groups to Ludicrous Speed with OVS (OpenStack Summit 2015)Thomas Graf
Open vSwitch (OVS) has long been a critical component of the Neutron's reference implementation, offering reliable and flexible virtual switching for cloud environments.
Being an early adopter of the OVS technology, Neutron's reference implementation made some compromises to stay within the early, stable featureset OVS exposed. In particular, Security Groups (SG) have been so far implemented by leveraging hybrid Linux Bridging and IPTables, which come at a significant performance overhead. However, thanks to recent developments and ongoing improvements within the OVS community, we are now able to implement feature-complete security groups directly within OVS.
In this talk we will summarize the existing Security Groups implementation in Neutron and compare its performance with the Open vSwitch-only approach. We hope this analysis will form the foundation of future improvements to the Neutron Open vSwitch reference design.
P4-based VNF and Micro-VNF Chaining for Servers With Intelligent Server AdaptersOpen-NFP
Commodity servers equipped with intelligent server adapters (ISAs) are being used as platforms for Network Functions Virtualization (NFV). The network traffic processing required by a specific use case is frequently expressed by forming a chain of Virtual Network Functions (VNFs). This demonstration illustrates that VNFs in the chain can be hosted on the server CPU or on the ISA. It furthermore illustrates that VNFs can be decomposed into components called Micro-VNFs, with the components again being hosted on the server CPU and/or the ISA. A P4 program (compiled to native code running on the ISA) defines the overall semantics of the datapath within an ISA equipped server and expresses how VNFs and Micro-VNFs should be composed within this platform. We show how mechanisms like tunnels and service headers programmed using P4 are employed to establish the VNF service chain across multiple network nodes.
David George
Lead Engineer, Netronome
David George is a lead engineer on the Netronome SDK team and is primarily responsible for Netronome's P4 data plane. He has previously been worked on the SDK simulator and x86 data plane components. He holds a Masters of Electrical Engineering from the University of Cape Town.
This work presents a P4 compiler backend targeting XDP, the eXpress Data Path. P4 is a domain-specific language describing how packets are processed by the data plane of a programmable network elements. XDP is designed for users who want programmability as well as performance.
https://github.com/williamtu/p4c-xdp/
SFO15-102:ODP Project Update
Speaker: Bill Fischofer
Date: September 21, 2015
★ Session Description ★
The OpenDataPlane project is now two years old and is beginning to see widespread interest on the part of both application writers and platform providers. This talk will discuss recent developments in ODP and its uses and look at what lies ahead for this fast-growing open source project.
★ Resources ★
Video: https://www.youtube.com/watch?v=QxK3waNaVEQ
Presentation: http://www.slideshare.net/linaroorg/sfo15102odp-project-update
Etherpad: pad.linaro.org/p/sfo15-102
Pathable: https://sfo15.pathable.com/meetings/302651
★ Event Details ★
Linaro Connect San Francisco 2015 - #SFO15
September 21-25, 2015
Hyatt Regency Hotel
http://www.linaro.org
http://connect.linaro.org
Linux Native, HTTP Aware Network SecurityThomas Graf
Cilium is open source software for transparently securing the network connectivity between application services deployed using Linux container management platforms like Docker and Kubernetes.
At the foundation of Cilium is a new Linux kernel technology called BPF, which enables the dynamic insertion of powerful security visibility and control logic within Linux itself. Because BPF runs inside the Linux kernel itself, Cilium security policies can be applied and updated without any changes to the application code or container configuration.
HKG15-110: ODP Project Update
---------------------------------------------------
Speaker: Bill Fischofer
Date: February 9, 2015
---------------------------------------------------
★ Session Summary ★
This session provides a summary of ODP activities since LCU ‰Û÷14 and highlights the main features of ODP v1.0 for applications as well as the validations used by conforming ODP implementation.
--------------------------------------------------
★ Resources ★
Pathable: https://hkg15.pathable.com/meetings/250771
Video: https://www.youtube.com/watch?v=xABcGPOCOuU
Etherpad: http://pad.linaro.org/p/hkg15-110
---------------------------------------------------
★ Event Details ★
Linaro Connect Hong Kong 2015 - #HKG15
February 9-13th, 2015
Regal Airport Hotel Hong Kong Airport
---------------------------------------------------
http://www.linaro.org
http://connect.linaro.org
The Paxos protocol is the foundation for building many fault-tolerant distributed systems and services. Given the importance of Paxos, and performance improvements to the protocol would have a significant impact on data-center infrastructure. We argue that implementing Paxos in network devices would significantly improve its performance. This talk describes an implementation of Paxos in P4, as well as our on-going efforts to evaluate the implementation on Netronome intelligent server adapters. Implementing Paxos provides a critical use case for P4, and will help drive the requirements for data plane languages in general. In the long term, we imagine that consensus could someday be offered as a network service.
Huynh Tu Dang
Italian University of Switzerland
Huynh Tu Dang is a second-year Ph.D. student in the Faculty of Informatics at Università della Svizzera Italiana. His research focuses on fault-tolerant distributed systems and application of software-defined networking (SDN). Previously, he worked as a research assistant at Ho Chi Minh International University and was an intern at INRIA, Nice-Sophia Antipolis on BtrPlace project. He received his Bachelor's degree from the School of Computer Science and Engineering at Ho Chi Minh International University.
LCU14 310- Cisco ODP
---------------------------------------------------
Speaker: Robbie King
Date: September 17, 2014
---------------------------------------------------
★ Session Summary ★
Cisco to present their experience using ODP to provide portable accelerated access to crypto functions on various SoCs.
---------------------------------------------------
★ Resources ★
Zerista: http://lcu14.zerista.com/event/member/137757
Google Event: https://plus.google.com/u/0/events/ckmld1hll5jjijq11frbqmptet8
Video: https://www.youtube.com/watch?v=eFlTmslVK-Y&list=UUIVqQKxCyQLJS6xvSmfndLA
Etherpad: http://pad.linaro.org/p/lcu14-310
---------------------------------------------------
★ Event Details ★
Linaro Connect USA - #LCU14
September 15-19th, 2014
Hyatt Regency San Francisco Airport
---------------------------------------------------
http://www.linaro.org
http://connect.linaro.org
The Next Generation Firewall for Red Hat Enterprise Linux 7 RCThomas Graf
FirewallD provides firewall management as a service in RHEL 7, abstracting policy definition and handling configuration. The kernel includes new filtering capabilities like connection tracking targets and extended accounting. Nftables, a new packet filtering subsystem to eventually replace iptables, uses a state machine-based approach with unified nft user interface.
Debugging is an essential part of Linux kernel development. In
user-space we have the support of the kernel and many debugging tools, tracking down a kernel bug, instead, can be very difficult if you don't know the proper methodologies. This talk will cover some techniques to understand how the kernel works, hunt down and fix kernel bugs in order to become a better kernel developer.
The document discusses ONNC, a compiler for deep learning formats like ONNX. It aims to connect ONNX to various deep learning accelerator (DLA) chips to help vendors bring products to market faster. Key features include supporting DLA features, optimizing memory usage and execution time, and being released as open source before the end of July 2018.
Fast and Reliable Apache Spark SQL EngineDatabricks
Building the next generation Spark SQL engine at speed poses new challenges to both automation and testing. At Databricks, we are implementing a new testing framework for assessing the quality and performance of new developments as they produced. Having more than 1,200 worldwide contributors, Apache Spark follows a rapid pace of development. At this scale, new testing tooling such as random query and data generation, fault injection, longevity stress, and scalability tests are essential to guarantee a reliable and performance Spark later in production. By applying such techniques, we will demonstrate the effectiveness of our testing infrastructure by drilling-down into cases where correctness and performance regressions have been found early. In addition, showing how they have been root-caused and fixed to prevent regressions in production and boosting the continuous delivery of new features.
O'Reilly Velocity New York 2016 presentation on modern Linux tracing tools and technology. Highlights the available tracing data sources on Linux (ftrace, perf_events, BPF) and demonstrates some tools that can be used to obtain traces, including DebugFS, the perf front-end, and most importantly, the BCC/BPF tool collection.
Containerizing HPC and AI applications using E4S and Performance Monitor toolGanesan Narayanasamy
The DOE Exascale Computing Project (ECP) Software Technology focus area is developing an HPC software ecosystem that will enable the efficient and performant execution of exascale applications. Through the Extreme-scale Scientific Software Stack (E4S) [https://e4s.io], it is developing a comprehensive and coherent software stack that will enable application developers to productively write highly parallel applications that can portably target diverse exascale architectures. E4S provides both source builds through the Spack platform and a set of containers that feature a broad collection of HPC software packages. E4S exists to accelerate the development, deployment, and use of HPC software, lowering the barriers for HPC users. It provides container images, build manifests, and turn-key, from-source builds of popular HPC software packages developed as Software Development Kits (SDKs). This effort includes a broad range of areas including programming models and runtimes (MPICH, Kokkos, RAJA, OpenMPI), development tools (TAU, HPCToolkit, PAPI), math libraries (PETSc, Trilinos), data and visualization tools (Adios, HDF5, Paraview), and compilers (LLVM), all available through the Spack package manager. It will describe the community engagements and interactions that led to the many artifacts produced by E4S. It will introduce the E4S containers are being deployed at the HPC systems at DOE national laboratories using Singularity, Shifter, and Charliecloud container runtimes.
This talk will describe how E4S can support the OpenPOWER platform with NVIDIA GPUs.
DOE Exascale Computing Project (EC) Software Technology focus area
is developing an HPC software ecosystem that will enable the efficient
and performant execution of exascale applications. Through the
Extreme-scale Scientific Software Stack (E4S), it is developing a
comprehensive and coherent software stack that will enable application
developers to productively write highly parallel applications that can
portably target diverse exascale architectures - including the IBM
OpenPOWER with NVIDIA GPU systems. E4S features a broad collection of
HPC software packages including the TAU Performance System(R) for
performance evaluation of HPC and AI/ML codes. TAU is a versatile
profiling and tracing toolkit that supports performance engineering of
codes written for CPU and GPUs and has support for most IBM platforms.
This talk will give an overview of TAU and E4S and how developers can
use these tools to analyze the performance of their codes. TAU supports
transparent instrumentation of codes without modifying the application
binary. The talk will describe TAU's support for CUDA, OpenACC, pthread,
OpenMP, Kokkos, and MPI applications. It will describe TAU's use for
Python based frameworks such as Tensorflow and PyTorch. It will cover
the use of TAU in E4S containers using Docker and Singularity runtimes
under ppc64le. E4S provides both source builds through the Spack
platform and a set of containers that feature a broad collection of HPC
software packages. E4S exists to accelerate the development, deployment, and use of HPC software, lowering the barriers for HPC users.
Have you ever wondered how to speed up your code in Python? This presentation will show you how to start. I will begin with a guide how to locate performance bottlenecks and then give you some tips how to speed up your code. Also I would like to discuss how to avoid premature optimization as it may be ‘the root of all evil’ (at least according to D. Knuth).
Krzysztof Mazepa - Netflow/cflow - ulubionym narzędziem operatorów SPPROIDEA
Netflow is a widely used tool by network operators to monitor network traffic. It works by collecting IP traffic flow information from routers and switches. This flow information can then be used for various purposes such as monitoring network applications and users, network planning, identifying attacks and security threats, usage in billing systems, and analyzing traffic at peering points between operators. The presentation discusses the benefits of using Netflow/cflow mechanisms for network operators and aims to start a discussion on how it can be utilized in service provider and enterprise networks.
Developed for DANS-KNAW. This presentation covers some of the fundamentals of the automation-tools. Helper scripts for automation of transfers in Archivematica. Designed to complement the API slide-deck, the two resources can probably be consumed in any order. Knowing the API will help you understand the automation-tools, but knowing the automation-tools may help you understand what you want to create using the API.
API slide-deck here: https://www.slideshare.net/Archivematica/introduction-to-the-archivematica-api-september-2018-122548752
Here are some useful GDB commands for debugging:
- break <function> - Set a breakpoint at a function
- break <file:line> - Set a breakpoint at a line in a file
- run - Start program execution
- next/n - Step over to next line, stepping over function calls
- step/s - Step into function calls
- finish - Step out of current function
- print/p <variable> - Print value of a variable
- backtrace/bt - Print the call stack
- info breakpoints/ib - List breakpoints
- delete <breakpoint#> - Delete a breakpoint
- layout src - Switch layout to source code view
- layout asm - Switch layout
Video and slides synchronized, mp3 and slide download available at URL http://bit.ly/2IDXhIf.
Changhoon Kim talks about the new PISA ASICs which promises multi Tb/s of packet processing with uncompromised programmability, and P4, a new domain-specific high-level language designed for networking. He shows how PISA and P4 will change the way we design, build, and run not just our networks, but also distributed systems and applications. Filmed at qconsf.com.
Changhoon Kim is a Director of System Architecture at Barefoot Networks. Prior to Barefoot, he worked at Windows Azure, Microsoft’s cloud-service division, and led engineering and research projects on the architecture, performance, and management of datacenter networks.
This presentation introduces Data Plane Development Kit overview and basics. It is a part of a Network Programming Series.
First, the presentation focuses on the network performance challenges on the modern systems by comparing modern CPUs with modern 10 Gbps ethernet links. Then it touches memory hierarchy and kernel bottlenecks.
The following part explains the main DPDK techniques, like polling, bursts, hugepages and multicore processing.
DPDK overview explains how is the DPDK application is being initialized and run, touches lockless queues (rte_ring), memory pools (rte_mempool), memory buffers (rte_mbuf), hashes (rte_hash), cuckoo hashing, longest prefix match library (rte_lpm), poll mode drivers (PMDs) and kernel NIC interface (KNI).
At the end, there are few DPDK performance tips.
Tags: access time, burst, cache, dpdk, driver, ethernet, hub, hugepage, ip, kernel, lcore, linux, memory, pmd, polling, rss, softswitch, switch, userspace, xeon
This document provides instructions for setting up a NetFlow collection and analysis system using Logstash, Elasticsearch, and Kibana. It explains what NetFlow is and discusses using Mikrotik routers and Logstash to collect NetFlow data. Logstash would process and index the NetFlow data into Elasticsearch for storage. Kibana could then be used to visualize and analyze the NetFlow data from Elasticsearch. The document concludes by providing step-by-step configuration instructions for Logstash, Elasticsearch, and Kibana to enable NetFlow collection, storage, and analysis.
The document describes a Cisco Live 2014 presentation on advanced troubleshooting of Cisco Nexus 7000 series switches. It includes an agenda that covers system, data plane, and control plane troubleshooting over 120 minutes. It also discusses strategies, tools, and techniques for troubleshooting these different areas. Some key tools highlighted include show commands, scripts like SystemCheck, packet capture with ELAME, and analyzing logs. The presentation provides guidance on approaches for each troubleshooting area and highlights the extensive logging capabilities of NX-OS.
Apache Pulsar Development 101 with PythonTimothy Spann
Apache Pulsar Development 101 with Python PS2022_Ecosystem_v0.0
There is always the fear a speaker cannot make it. So just in case, since I was the MC for the ecosystem track I put together a talk just in case.
Here it is. Never seen or presented.
VampirTrace provides instrumentation and run-time measurement capabilities. It allows for automatic, manual, and binary instrumentation. Run-time measurement includes collecting trace data behind the scenes and post-processing. Users have options to configure various settings like environment variables, hardware performance counters, memory allocation counters, filtering, and grouping. FAQ and troubleshooting information is also available.
This webinar explains why PISA chips are inevitable, provides overview of machine architecture of such switches, presents a brief primer on the P4 language with sample programs for a variety of networks and demonstrates a powerful network diagnostics application implemented in P4.
Programmability in SDNs is confined to the network control plane. The forwarding plane is still largely dictated by fixed-function switching chips. Our goal is to change that, and to allow programmers to define how packets are to be processed all the way down to the wire.
This is made possible by a new generation of high-performance forwarding chips. At the high-end, PISA (Protocol-Independent Switch Architecture) chips promise multi-Tb/s of packet processing. At the mid- and low-end of the performance spectrum, CPUs, GPUs, FPGAs, and NPUs already offer great flexibility with performance of a few tens to hundreds of Gb/s.
In addition to programmable forwarding chips, we also need a high-level language to dictate the forwarding behavior in a target independent fashion. "P4" (www.p4.org) is such a language. In P4, the programer declares how packets are to be processed, and a compiler generates a configuration for a PISA chip, or a programmable target in general. For example, the programmer might program the switch to be a top-of-rack switch, a firewall, or a load-balancer; and might add features to run automatic diagnostics and novel congestion control algorithms.
Tungsten Fabric provides a network fabric connecting all environments and clouds. It aims to be the most ubiquitous, easy-to-use, scalable, secure, and cloud-grade SDN stack. It has over 300 contributors and 100 active developers. Recent improvements include better support for microservices, containers, ingress/egress policies, and load balancing. It can provide consistent security and networking across VMs, containers, and bare metal.
Similar to (Open) MPI, Parallel Computing, Life, the Universe, and Everything (20)
MPI Sessions: a proposal to the MPI ForumJeff Squyres
This document discusses proposals for improving MPI (Message Passing Interface) to allow for more flexible initialization and usage of MPI functionality. The key proposals are:
1. Introduce the concept of an "MPI session" which is a local handle to the MPI library that allows multiple sessions within a process.
2. Query the underlying runtime system to get static "sets" of processes and create MPI groups and communicators from these sets across different sessions.
3. Split MPI functions into two categories - those that initialize/query/destroy objects and those for performance-critical communication/collectives. The former category would initialize MPI transparently.
4. Remove the requirement for MPI_Init() and MPI
This document summarizes a presentation by Martin Schulz on MPI 3.1 and plans for MPI 4.0. It discusses the current state of MPI 3.1 including features and implementation status. It provides updates from various MPI Forum working groups, including fault tolerance, hybrid programming, persistence, point-to-point communication, one-sided communication, and tools. It discusses the ratification of MPI 3.0 and 3.1 and plans for MPI 4.0. Several active working groups for MPI 4.0 features are listed.
Cisco's journey from Verbs to LibfabricJeff Squyres
This document summarizes Cisco's transition from using the Verbs API to using the Libfabric API for their usNIC network interface card. The Verbs API has limitations that make it difficult to support Ethernet features. Libfabric addresses these issues and more closely matches Cisco's hardware. Performance tests show Libfabric outperforming Verbs. Open MPI was adapted to support Libfabric through new plugins. This allows Libfabric to be used for both provider-specific and portable communication, benefiting MPI implementations. Cisco believes Libfabric is the best path forward as it matches their hardware, has performance benefits, and features MPI implementations have wanted.
(Very) Loose proposal to revamp MPI_INIT and MPI_FINALIZEJeff Squyres
This is one of two mini-talks that I gave at Euro MPI 2015 / Bordeaux. It's mainly a taste of the kinds of discussions that we have at the MPI Forum. This particular talk is about some thoughts I've had about revamping MPI_INIT and MPI_FINALIZE. It is by NO means a finalized proposal -- it's mainly to give you an idea of the scope of ideas that are routinely discussed at the Forum. ...hey, you should attend MPI Forum meetings and see for yourself!
This document summarizes feedback from the MPI community on requirements for the network layer. It discusses what MPI needs from the network layer including messages, efficient APIs, asynchronous progress, and scalability to millions of peers. It outlines features the MPI community likes in verbs, such as different communication modes, RDMA, and atomic operations. It also describes additional features wanted, such as non-blocking operations, buffer specifications as parameters, and standalone send/receive channels. The document was presented to the OpenFabrics libfabric working group to inform the design.
Cisco usNIC: how it works, how it is used in Open MPIJeff Squyres
The document discusses Cisco's usNIC (Cisco Userspace NIC) and Cisco's entry into the server market through its UCS (Cisco Unified Computing System) servers. It provides details on Cisco's 1U and 2U Intel-based servers, which provide low-latency 10 and 40Gb Ethernet connectivity. The document also summarizes how Cisco has achieved the #2 market share position in the blade server market due to customers' demand for data center innovation and Cisco UCS's record-setting performance benchmarks.
The document discusses Cisco's offerings for high performance computing (HPC) clusters, including Cisco UCS servers in various form factors, Cisco Nexus switches with low latency Ethernet, and Cisco's Virtual Interface Card (VIC) with the usNIC driver. It provides performance results showing ultra low latency of 2.16 microseconds for small Open MPI messages and 89.69% efficiency on High Performance Linpack across 512 cores on a 32 node cluster. Cisco positions its unified computing system as providing industry-leading compute performance for HPC workloads without compromise.
The document discusses the history and development of the MPI standard for parallel programming. It describes how MPI was developed in the early 1990s to create a common standard for message passing programming that could unite the various proprietary interfaces that existed at the time. The first MPI standard was released in 1994 after several years of development and input from vendors, national labs, and researchers. MPI was quickly adopted due to a reference implementation and its ability to provide a portable abstraction while allowing for high-performance implementations.
These are the slides that I presented at MOSSCon 2013 (slightly edited, because the original slides contained some animations that I morphed to look ok on Slideshare).
The general talk is about two things:
1. General philosophy of open source at Cisco.
2. My specific open source work at Cisco.
Enjoy!
This document proposes a new type of MPI request called a timer request. A timer request can be created using MPI_TIMER_CREATE, which takes a double specifying the time at which the request will complete. Timer requests function like other requests in that they can be tested, waited on, canceled, and freed. They provide a way to break out of MPI wait calls after a specified time even if no other requests have completed. Several use cases are described, including using timers with MPI_WAIT, MPI_WAITANY, MPI_WAITALL, and restarting a timer request in a loop. It is suggested that text about timers could go in the environment control or point-to-point communication chapters.
MPI_MPROBE eliminates race conditions between probing and receiving messages. It is useful for event-based and multi-threaded MPI applications where probing is used to determine message properties like size before receiving. MPI_MPROBE guarantees the corresponding receive will retrieve the probed message, even if there are delays, by removing the message from the queue after a successful probe. This protects against races where another process could receive the message first.
The Message Passing Interface (MPI) in Layman's TermsJeff Squyres
Introduction to the basic concepts of what the Message Passing Interface (MPI) is, and a brief overview of the Open MPI open source software implementation of the MPI specification.
Unlocking Productivity: Leveraging the Potential of Copilot in Microsoft 365, a presentation by Christoforos Vlachos, Senior Solutions Manager – Modern Workplace, Uni Systems
Sudheer Mechineni, Head of Application Frameworks, Standard Chartered Bank
Discover how Standard Chartered Bank harnessed the power of Neo4j to transform complex data access challenges into a dynamic, scalable graph database solution. This keynote will cover their journey from initial adoption to deploying a fully automated, enterprise-grade causal cluster, highlighting key strategies for modelling organisational changes and ensuring robust disaster recovery. Learn how these innovations have not only enhanced Standard Chartered Bank’s data infrastructure but also positioned them as pioneers in the banking sector’s adoption of graph technology.
How to Get CNIC Information System with Paksim Ga.pptxdanishmna97
Pakdata Cf is a groundbreaking system designed to streamline and facilitate access to CNIC information. This innovative platform leverages advanced technology to provide users with efficient and secure access to their CNIC details.
“An Outlook of the Ongoing and Future Relationship between Blockchain Technologies and Process-aware Information Systems.” Invited talk at the joint workshop on Blockchain for Information Systems (BC4IS) and Blockchain for Trusted Data Sharing (B4TDS), co-located with with the 36th International Conference on Advanced Information Systems Engineering (CAiSE), 3 June 2024, Limassol, Cyprus.
Essentials of Automations: The Art of Triggers and Actions in FMESafe Software
In this second installment of our Essentials of Automations webinar series, we’ll explore the landscape of triggers and actions, guiding you through the nuances of authoring and adapting workspaces for seamless automations. Gain an understanding of the full spectrum of triggers and actions available in FME, empowering you to enhance your workspaces for efficient automation.
We’ll kick things off by showcasing the most commonly used event-based triggers, introducing you to various automation workflows like manual triggers, schedules, directory watchers, and more. Plus, see how these elements play out in real scenarios.
Whether you’re tweaking your current setup or building from the ground up, this session will arm you with the tools and insights needed to transform your FME usage into a powerhouse of productivity. Join us to discover effective strategies that simplify complex processes, enhancing your productivity and transforming your data management practices with FME. Let’s turn complexity into clarity and make your workspaces work wonders!
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Speck&Tech
ABSTRACT: A prima vista, un mattoncino Lego e la backdoor XZ potrebbero avere in comune il fatto di essere entrambi blocchi di costruzione, o dipendenze di progetti creativi e software. La realtà è che un mattoncino Lego e il caso della backdoor XZ hanno molto di più di tutto ciò in comune.
Partecipate alla presentazione per immergervi in una storia di interoperabilità, standard e formati aperti, per poi discutere del ruolo importante che i contributori hanno in una comunità open source sostenibile.
BIO: Sostenitrice del software libero e dei formati standard e aperti. È stata un membro attivo dei progetti Fedora e openSUSE e ha co-fondato l'Associazione LibreItalia dove è stata coinvolta in diversi eventi, migrazioni e formazione relativi a LibreOffice. In precedenza ha lavorato a migrazioni e corsi di formazione su LibreOffice per diverse amministrazioni pubbliche e privati. Da gennaio 2020 lavora in SUSE come Software Release Engineer per Uyuni e SUSE Manager e quando non segue la sua passione per i computer e per Geeko coltiva la sua curiosità per l'astronomia (da cui deriva il suo nickname deneb_alpha).
Dr. Sean Tan, Head of Data Science, Changi Airport Group
Discover how Changi Airport Group (CAG) leverages graph technologies and generative AI to revolutionize their search capabilities. This session delves into the unique search needs of CAG’s diverse passengers and customers, showcasing how graph data structures enhance the accuracy and relevance of AI-generated search results, mitigating the risk of “hallucinations” and improving the overall customer journey.
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfMalak Abu Hammad
Discover how MongoDB Atlas and vector search technology can revolutionize your application's search capabilities. This comprehensive presentation covers:
* What is Vector Search?
* Importance and benefits of vector search
* Practical use cases across various industries
* Step-by-step implementation guide
* Live demos with code snippets
* Enhancing LLM capabilities with vector search
* Best practices and optimization strategies
Perfect for developers, AI enthusiasts, and tech leaders. Learn how to leverage MongoDB Atlas to deliver highly relevant, context-aware search results, transforming your data retrieval process. Stay ahead in tech innovation and maximize the potential of your applications.
#MongoDB #VectorSearch #AI #SemanticSearch #TechInnovation #DataScience #LLM #MachineLearning #SearchTechnology
In the rapidly evolving landscape of technologies, XML continues to play a vital role in structuring, storing, and transporting data across diverse systems. The recent advancements in artificial intelligence (AI) present new methodologies for enhancing XML development workflows, introducing efficiency, automation, and intelligent capabilities. This presentation will outline the scope and perspective of utilizing AI in XML development. The potential benefits and the possible pitfalls will be highlighted, providing a balanced view of the subject.
We will explore the capabilities of AI in understanding XML markup languages and autonomously creating structured XML content. Additionally, we will examine the capacity of AI to enrich plain text with appropriate XML markup. Practical examples and methodological guidelines will be provided to elucidate how AI can be effectively prompted to interpret and generate accurate XML markup.
Further emphasis will be placed on the role of AI in developing XSLT, or schemas such as XSD and Schematron. We will address the techniques and strategies adopted to create prompts for generating code, explaining code, or refactoring the code, and the results achieved.
The discussion will extend to how AI can be used to transform XML content. In particular, the focus will be on the use of AI XPath extension functions in XSLT, Schematron, Schematron Quick Fixes, or for XML content refactoring.
The presentation aims to deliver a comprehensive overview of AI usage in XML development, providing attendees with the necessary knowledge to make informed decisions. Whether you’re at the early stages of adopting AI or considering integrating it in advanced XML development, this presentation will cover all levels of expertise.
By highlighting the potential advantages and challenges of integrating AI with XML development tools and languages, the presentation seeks to inspire thoughtful conversation around the future of XML development. We’ll not only delve into the technical aspects of AI-powered XML development but also discuss practical implications and possible future directions.
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!SOFTTECHHUB
As the digital landscape continually evolves, operating systems play a critical role in shaping user experiences and productivity. The launch of Nitrux Linux 3.5.0 marks a significant milestone, offering a robust alternative to traditional systems such as Windows 11. This article delves into the essence of Nitrux Linux 3.5.0, exploring its unique features, advantages, and how it stands as a compelling choice for both casual users and tech enthusiasts.
For the full video of this presentation, please visit: https://www.edge-ai-vision.com/2024/06/building-and-scaling-ai-applications-with-the-nx-ai-manager-a-presentation-from-network-optix/
Robin van Emden, Senior Director of Data Science at Network Optix, presents the “Building and Scaling AI Applications with the Nx AI Manager,” tutorial at the May 2024 Embedded Vision Summit.
In this presentation, van Emden covers the basics of scaling edge AI solutions using the Nx tool kit. He emphasizes the process of developing AI models and deploying them globally. He also showcases the conversion of AI models and the creation of effective edge AI pipelines, with a focus on pre-processing, model conversion, selecting the appropriate inference engine for the target hardware and post-processing.
van Emden shows how Nx can simplify the developer’s life and facilitate a rapid transition from concept to production-ready applications.He provides valuable insights into developing scalable and efficient edge AI solutions, with a strong focus on practical implementation.
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...Neo4j
Leonard Jayamohan, Partner & Generative AI Lead, Deloitte
This keynote will reveal how Deloitte leverages Neo4j’s graph power for groundbreaking digital twin solutions, achieving a staggering 100x performance boost. Discover the essential role knowledge graphs play in successful generative AI implementations. Plus, get an exclusive look at an innovative Neo4j + Generative AI solution Deloitte is developing in-house.
UiPath Test Automation using UiPath Test Suite series, part 6DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 6. In this session, we will cover Test Automation with generative AI and Open AI.
UiPath Test Automation with generative AI and Open AI webinar offers an in-depth exploration of leveraging cutting-edge technologies for test automation within the UiPath platform. Attendees will delve into the integration of generative AI, a test automation solution, with Open AI advanced natural language processing capabilities.
Throughout the session, participants will discover how this synergy empowers testers to automate repetitive tasks, enhance testing accuracy, and expedite the software testing life cycle. Topics covered include the seamless integration process, practical use cases, and the benefits of harnessing AI-driven automation for UiPath testing initiatives. By attending this webinar, testers, and automation professionals can gain valuable insights into harnessing the power of AI to optimize their test automation workflows within the UiPath ecosystem, ultimately driving efficiency and quality in software development processes.
What will you get from this session?
1. Insights into integrating generative AI.
2. Understanding how this integration enhances test automation within the UiPath platform
3. Practical demonstrations
4. Exploration of real-world use cases illustrating the benefits of AI-driven test automation for UiPath
Topics covered:
What is generative AI
Test Automation with generative AI and Open AI.
UiPath integration with generative AI
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Climate Impact of Software Testing at Nordic Testing DaysKari Kakkonen
My slides at Nordic Testing Days 6.6.2024
Climate impact / sustainability of software testing discussed on the talk. ICT and testing must carry their part of global responsibility to help with the climat warming. We can minimize the carbon footprint but we can also have a carbon handprint, a positive impact on the climate. Quality characteristics can be added with sustainability, and then measured continuously. Test environments can be used less, and in smaller scale and on demand. Test techniques can be used in optimizing or minimizing number of tests. Test automation can be used to speed up testing.
20 Comprehensive Checklist of Designing and Developing a WebsitePixlogix Infotech
Dive into the world of Website Designing and Developing with Pixlogix! Looking to create a stunning online presence? Look no further! Our comprehensive checklist covers everything you need to know to craft a website that stands out. From user-friendly design to seamless functionality, we've got you covered. Don't miss out on this invaluable resource! Check out our checklist now at Pixlogix and start your journey towards a captivating online presence today.
Removing Uninteresting Bytes in Software FuzzingAftab Hussain
Imagine a world where software fuzzing, the process of mutating bytes in test seeds to uncover hidden and erroneous program behaviors, becomes faster and more effective. A lot depends on the initial seeds, which can significantly dictate the trajectory of a fuzzing campaign, particularly in terms of how long it takes to uncover interesting behaviour in your code. We introduce DIAR, a technique designed to speedup fuzzing campaigns by pinpointing and eliminating those uninteresting bytes in the seeds. Picture this: instead of wasting valuable resources on meaningless mutations in large, bloated seeds, DIAR removes the unnecessary bytes, streamlining the entire process.
In this work, we equipped AFL, a popular fuzzer, with DIAR and examined two critical Linux libraries -- Libxml's xmllint, a tool for parsing xml documents, and Binutil's readelf, an essential debugging and security analysis command-line tool used to display detailed information about ELF (Executable and Linkable Format). Our preliminary results show that AFL+DIAR does not only discover new paths more quickly but also achieves higher coverage overall. This work thus showcases how starting with lean and optimized seeds can lead to faster, more comprehensive fuzzing campaigns -- and DIAR helps you find such seeds.
- These are slides of the talk given at IEEE International Conference on Software Testing Verification and Validation Workshop, ICSTW 2022.
6. Fun stats
• ohloh.net says:
§ 819,741 lines of code
§ Average 10-20
committers at a time
§ “Well-commented
source code”
• I rank in top-25 ohloh
stats for:
§
§
§
§
C
Automake
Shell script
Fortran (…ouch)
7. Current status
• Version 1.6.5 / stable series
§ Unlikely to see another release
• Version 1.7.3 / feature series
§ v1.7.4 due (hopefully) by end of 2013
§ Plan to transition to v1.8 in Q1 2014
8. MPI conformance
• MPI-2.2 conformant as of v1.7.3
§ Finally finished several 2.2 issues that no one
really cares about
• MPI-3 conformance just missing new RMA
§ Tracked on wiki:
https://svn.open-mpi.org/trac/ompi/wiki/MPIConformance
§ Hope to be done by v1.7.4
9. New MPI-3 features
• Mo’ betta Fortran bindings
§ You should “use mpi_f08”. Really.
• Matched probe
• Sparse and neighborhood collectives
• “MPI_T” tools interface
• Nonblocking communicator duplication
• Noncollective communicator creation
• Hindexed block datatype
10. New Open MPI features
• Better support for more runtime systems
§ PMI2 scalability, etc.
• New generalized processor affinity system
• Better CUDA support
• Java MPI bindings (!)
• Transports:
§ Cisco usNIC support
§ Mellanox MXM2 and hcoll support
§ Portals 4 support
11. My new favorite random feature
• mpirun CLI option <tab> completion
§ Bash and zsh
§ Contributed by Nathan Hjelm, LANL
shell$ mpirun --mca btl_usnic_<tab>
btl_usnic_cq_num
btl_usnic_eager_limit
btl_usnic_if_exclude
btl_usnic_if_include
btl_usnic_max_btls
btl_usnic_mpool
btl_usnic_prio_rd_num
btl_usnic_prio_sd_num
btl_usnic_priority_limit
btl_usnic_rd_num
btl_usnic_retrans_timeout
btl_usnic_rndv_eager_limit
btl_usnic_sd_num
--------------
Number of completion queue!
Eager send limit (0 = use !
Comma-delimited list of de!
Comma-delimited list of de!
Maximum number of usNICs t!
Name of the memory pool to!
Number of pre-posted prior!
Maximum priority send desc!
Max size of "priority" mes!
Number of pre-posted recei!
Number of microseconds bef!
Eager rendezvous limit (0 !
Maximum send descriptors t!
12. Two features to discuss
in detail…
1. “MPI_T” interface
2. Flexible process affinity system
14. MPI_T interface
• Added in MPI-3.0
• So-called “MPI_T” because all the
functions start with that prefix
§ T = tools
• APIs to get/set MPI implementation values
§ Control variables (e.g., implementation
tunables)
§ Performance variables (e.g., run-time stats)
15. MPI_T control variables (“cvar”)
• Another interface to MCA param values
• In addition to existing methods:
§ mpirun CLI options
§ Environment variables
§ Config file(s)
• Allows tools / applications to
programmatically list all OMPI MCA params
16. MPI_T cvar example
• MPI_T_cvar_get_num()
§ Returns the number of control variables
• MPI_T_cvar_get_info(index, …) returns:
§ String name and description
§ Verbosity level (see next slide)
§ Type of the variable (integer, double, etc.)
§ Type of MPI object (communicator, etc.)
§ “Writability” scope
17. Verbosity levels
Level name
Level description
USER_BASIC
Basic information of interest to users
USER_DETAIL
Detailed information of interest to users
USER_ALL
All remaining information of interest to users
TUNER_BASIC
Basic information of interest for tuning
TUNER_DETAIL
Detailed information of interest for tuning
TUNER_ALL
All remaining information of interest to tuning
MPIDEV_BASIC
Basic information for MPI implementers
MPIDEV_DETAIL
Detailed information for MPI implementers
MPIDEV_ALL
All remaining information for MPI implementers
18. Open MPI interpretation of
verbosity levels
1. User
§ Parameters required
for correctness
§ As few as possible
2. Tuner
§ Tweak MPI
performance
§ Resource levels, etc.
3. MPI developer
§ For Open MPI devs
1. Basic
Even for less-advanced
users and tuners
2. Detailed
Useful but you won’t
need to change them
often
3. All
Anything else
19. “Writeability” scope
Level name
Level description
CONSTANT
Read-only, constant value
READONLY
Read-only, but the value may change
LOCAL
Writing is local operation
GROUP
Writing must be done as a group, and all values
must be consistent
GROUP_EQ
Writing must be done as a group, and all values
must be exactly the same
ALL
Writing must be done by all processes, and all
values must be consistent
ALL_EQ
Writing must be done by all processes, and all
values must be exactly the same
20. Reading / writing a cvar
• MPI_T_cvar_handle_alloc(index, handle, …)
§ Allocates an MPI_T handle
§ Binds it to a specific MPI handle (e.g., a
communicator), or BIND_NO_OBJECT
• MPI_T_cvar_read(handle, buf)
• MPI_T_cvar_write(handle, buf)
à OMPI has very, very few writable control
variables after MPI_INIT
21. MPI_T Performance variables (“pvar”)
• New information available from OMPI
§ Run-time statistics of implementation details
§ Similar interface to control variables
• Not many available in OMPI yet
• Cisco usnic BTL exports 24 pvars
§ Per usNIC interface
§ Stats about underlying network
(more details to be provided in usNIC talk)
23. Locality matters
• Goals:
§ Minimize data transfer distance
§ Reduce network congestion and contention
• …this also matters inside the server, too!
24. Machine (128GB)
NUMANode P#0 (64GB)
Socket P#0
PCI 8086:1521
L3 (20MB)
eth0
L2 (256KB)
L2 (256KB)
L2 (256KB)
L2 (256KB)
L2 (256KB)
L2 (256KB)
L2 (256KB)
L2 (256KB)
L1d (32KB)
L1d (32KB)
L1d (32KB)
L1d (32KB)
L1d (32KB)
L1d (32KB)
L1d (32KB)
L1d (32KB)
L1i (32KB)
L1i (32KB)
L1i (32KB)
L1i (32KB)
L1i (32KB)
L1i (32KB)
L1i (32KB)
L1i (32KB)
Core P#0
Core P#1
Core P#2
Core P#3
Core P#4
Core P#5
Core P#6
Core P#7
PU P#0
PU P#1
PU P#2
PU P#3
PU P#4
PU P#5
PU P#6
PU P#7
PU P#16
PU P#17
PU P#18
PU P#19
PU P#20
PU P#21
PU P#22
PU P#23
PCI 8086:1521
eth1
PCI 8086:1521
eth2
PCI 8086:1521
eth3
PCI 1137:0043
Intel Xeon E5-2690 (“Sandy Bridge”)
2 sockets, 8 cores, 64GB per socket
eth4
PCI 1137:0043
eth5
PCI 102b:0522
NUMANode P#1 (64GB)
Socket P#1
PCI 1000:005b
L3 (20MB)
sda
L2 (256KB)
L2 (256KB)
L2 (256KB)
L2 (256KB)
L2 (256KB)
L2 (256KB)
L2 (256KB)
L2 (256KB)
L1d (32KB)
L1d (32KB)
L1d (32KB)
L1d (32KB)
L1d (32KB)
L1d (32KB)
L1d (32KB)
L1d (32KB)
L1i (32KB)
L1i (32KB)
L1i (32KB)
L1i (32KB)
L1i (32KB)
L1i (32KB)
L1i (32KB)
L1i (32KB)
Core P#0
Core P#1
Core P#2
Core P#3
Core P#4
Core P#5
Core P#6
Core P#7
PU P#8
PU P#9
PU P#10
PU P#11
PU P#12
PU P#13
PU P#14
PU P#15
PU P#24
PU P#25
PU P#26
PU P#27
PU P#28
PU P#29
PU P#30
PU P#31
Indexes: physical
Date: Mon Jan 28 10:51:26 2013
sdb
PCI 1137:0043
eth6
PCI 1137:0043
eth7
25. Machine (128GB)
NUMANode P#0 (64GB)
Socket P#0
PCI 8086:1521
L3 (20MB)
eth0
L2 (256KB)
L2 (256KB)
L2 (256KB)
L1d (32KB)
L1d (32KB)
L1d (32KB)
L1i (32KB)
L1i (32KB)
Core P#0
L2 (256KB)
L2 (256KB)
L2 (256KB)
L2 (256KB)
L2 (256KB)
L1d (32KB)
L1d (32KB)
L1d (32KB)
L1d (32KB)
L1d (32KB)
L1i (32KB)
L1i (32KB)
L1i (32KB)
L1i (32KB)
L1i (32KB)
L1i (32KB)
Core P#1
Core P#2
Core P#3
Core P#4
Core P#5
Core P#6
Core P#7
PU P#0
PU P#1
PU P#2
PU P#3
PU P#4
PU P#5
PU P#6
PU P#7
PU P#16
PU P#17
PU P#18
PU P#19
PU P#20
PU P#21
PU P#22
PU P#23
L1 and L2
PCI 8086:1521
eth1
PCI 8086:1521
eth2
1G
NICs
PCI 8086:1521
eth3
PCI 1137:0043
Intel Xeon E5-2690 (“Sandy Bridge”)
2 sockets, 8 cores, 64GB per socket
eth4
PCI 1137:0043
eth5
10G
NICs
PCI 102b:0522
NUMANode P#1 (64GB)
Socket P#1
L3 (20MB)
PCI 1000:005b
Shared L3
sda
L2 (256KB)
L2 (256KB)
L2 (256KB)
L2 (256KB)
L2 (256KB)
L2 (256KB)
L2 (256KB)
L2 (256KB)
L1d (32KB)
L1d (32KB)
L1d (32KB)
L1d (32KB)
L1d (32KB)
L1d (32KB)
L1d (32KB)
L1d (32KB)
L1i (32KB)
L1i (32KB)
L1i (32KB)
L1i (32KB)
L1i (32KB)
L1i (32KB)
L1i (32KB)
L1i (32KB)
Core P#0
Core P#1
Core P#2
Core P#3
Core P#4
Core P#5
Core P#6
Core P#7
PU P#8
PU P#9
PU P#24
PU P#25
Indexes: physical
Date: Mon Jan 28 10:51:26 2013
Hyperthreading enabled
PU P#10
PU P#11
PU P#12
PU P#13
PU P#14
PU P#15
PU P#26
PU P#27
PU P#28
PU P#29
PU P#30
PU P#31
sdb
PCI 1137:0043
eth6
PCI 1137:0043
eth7
10G
NICs
26. A user’s playground
The intent of this work is to provide a mechanism that
allows users to explore the process-placement space
within the scope of their own applications.
28. LAMA
• Supports a wide range of regular mapping
patterns
§ Drawn from much prior work
§ Most notably, heavily inspired by BlueGene/P
and /Q mapping systems
29. Launching MPI applications
• Three steps in MPI process placement
1. Mapping
2. Ordering
3. Binding
• Let's discuss how these work in Open MPI
31. Mapping
• MPI's runtime must create a map, pairing
processes-to-processors (and memory).
• Basic technique:
§ Gather hwloc topologies from allocated nodes.
§ Mapping agent then makes a plan for which
resources are assigned to processes
32. Mapping agent
• Act of planning mappings:
§ Specify which process will be launched on
each server
§ Identify if any hardware resource will be
oversubscribed
• Processes are mapped to the resolution of
a single processing unit (PU)
§ Smallest unit of allocation: hardware thread
§ In HPC, usually the same as a processor core
33. Oversubscription
• Common / usual definition:
§ When a single PU is assigned more than one
process
• Complicating the definition:
§ Some application may need more than one PU
per process (multithreaded applications)
• How can the user express what their
application means by “oversubscription”?
36. Ordering
• Each process must be assigned a unique
rank in MPI_COMM_WORLD
• Two common types of ordering:
§ natural
• The order in which processes are mapped
determines their rank in MCW
§ sequential
• The processes are sequentially numbered starting
at the first processing unit, and continuing until the
last processing unit
38. Binding
• Process-launching agent working with the
OS to limit where each process can run:
1. No restrictions
2. Limited set of restrictions
3. Specific resource restrictions
• “Binding width”
§ The number of PUs to which a process is
bound
39. Command Line Interface (CLI)
• 4 levels of abstraction for the user
§ Level 1: None
§ Level 2: Simple, common patterns
§ Level 3: LAMA process layout regular patterns
§ Level 4: Irregular patterns (not described in
this talk)
40. CLI: Level 1 (none)
• No mapping or binding options specified
§ May or may not specify the number of
processes to launch (-np)
§ If not specified, default to the number of cores
available in the allocation
§ One process is mapped to each core in the
system in a "by-core" style
§ Processes are not bound
• …for backwards compatibility reasons L
41. CLI: Level 2 (common)
• Simple, common patterns for mapping and
binding
§ Specify mapping pattern with
• --map-by X (e.g., --map-by socket)
§ Specify binding option with:
• --bind-to Y (e.g., --bind-to core)
§ All of these options are translated to Level 3
options for processing by LAMA
(full list of X / Y values shown later)
42. CLI: Level 3 (regular patterns)
• LAMA process layout regular patterns
§ Power users wanting something unique for
their application
§ Four MCA run-time parameters
• rmaps_lama_map: Mapping process layout
• rmaps_lama_bind: Binding width
• rmaps_lama_order: Ordering of MCW ranks
• rmaps_lama_mppr: Maximum allowable number of
processes per resource (oversubscription)
43. rmaps_lama_map (map)
• Takes as an argument the "process layout"
§ A series of nine tokens
• allowing 9! (362,880) mapping permutation options.
§ Preferred iteration order for LAMA
• innermost iteration specified first
• outermost iteration specified last
54. rmaps_lama_order (order)
• Select which ranks are assigned to
processes in MCW
Natural order for
map-by-node (default)
Sequential order for
any mapping
• There are other possible orderings, but no
one has asked for them yet…
55. rmaps_lama_mppr (mppr)
• mppr (mip-per) sets the Maximum number
of allowable Processes Per Resource
§ User-specified definition of oversubscription
• Comma-delimited list of <#:resource>!
§ 1:c
à At most one process per core
§ 1:c,2:s à At most one process per core, and
at most two processes per socket
56. MPPR
§ 1:c à At most one process per core
Machine (128GB)
NUMANode P#0 (64GB)
Socket P#0
L3 (20MB)
L2 (256KB)
L2 (256KB)
L2 (256KB)
L2 (256KB)
L2 (256KB)
L2 (256KB)
L2 (256KB)
L2 (256KB)
L1d (32KB)
L1d (32KB)
L1d (32KB)
L1d (32KB)
L1d (32KB)
L1d (32KB)
L1d (32KB)
L1d (32KB)
L1i (32KB)
L1i (32KB)
L1i (32KB)
L1i (32KB)
L1i (32KB)
L1i (32KB)
L1i (32KB)
L1i (32KB)
Core P#0
Core P#1
Core P#2
Core P#3
Core P#4
Core P#5
Core P#6
Core P#7
PU P#0
PU P#1
PU P#2
PU P#3
PU P#4
PU P#5
PU P#6
PU P#7
PU P#16
PU P#17
PU P#18
PU P#19
PU P#20
PU P#21
PU P#22
PU P#23
57. MPPR
§ 1:c,2:s à At most one process per core and
two processes per socket
Machine (128GB)
NUMANode P#0 (64GB)
Socket P#0
L3 (20MB)
L2 (256KB)
L2 (256KB)
L2 (256KB)
L2 (256KB)
L2 (256KB)
L2 (256KB)
L2 (256KB)
L2 (256KB)
L1d (32KB)
L1d (32KB)
L1d (32KB)
L1d (32KB)
L1d (32KB)
L1d (32KB)
L1d (32KB)
L1d (32KB)
L1i (32KB)
L1i (32KB)
L1i (32KB)
L1i (32KB)
L1i (32KB)
L1i (32KB)
L1i (32KB)
L1i (32KB)
Core P#0
Core P#1
Core P#2
Core P#3
Core P#4
Core P#5
Core P#6
Core P#7
PU P#0
PU P#1
PU P#2
PU P#3
PU P#4
PU P#5
PU P#6
PU P#7
PU P#16
PU P#17
PU P#18
PU P#19
PU P#20
PU P#21
PU P#22
PU P#23
61. Report bindings
• Displays prettyprint representation of the
binding actually used for each process.
§ Visual feedback = quite helpful when exploring
mpirun -np 4 --mca rmaps lama --mca rmaps_lama_bind
1c --mca rmaps_lama_map nbsch --mca rmaps_lama_mppr
1:c --report-bindings hello_world!
MCW
MCW
MCW
MCW
rank
rank
rank
rank
0
1
2
3
bound
bound
bound
bound
to
to
to
to
socket
socket
socket
socket
0[core
1[core
0[core
1[core
0[hwt
8[hwt
1[hwt
9[hwt
0-1]]:
0-1]]:
0-1]]:
0-1]]:
[BB/../../../../../../..][../../../../../../../..]!
[../../../../../../../..][BB/../../../../../../..]!
[../BB/../../../../../..][../../../../../../../..]!
[../../../../../../../..][../BB/../../../../../..]!
62. Feedback
• Available in Open MPI v1.7.2 (and later)
• Open questions to users:
§ Are more flexible ordering options useful?
§ What common mapping patterns are useful?
§ What additional features would you like to
see?