This document provides an overview of a tutorial presentation on designing cloud and grid computing systems with InfiniBand and high-speed Ethernet. It discusses how InfiniBand and high-speed Ethernet were introduced to address bottlenecks in traditional protocols, I/O interfaces, and network speeds that were limiting scalability. InfiniBand aimed to alleviate all three bottlenecks, while high-speed Ethernet focused on faster network speeds and relied on other technologies for protocol processing and I/O interfaces. The presentation covers the standards bodies that defined InfiniBand and high-speed Ethernet, as well as an overview of the architectures and how they tackle communication bottlenecks.
The document provides an overview of InfiniBand essentials that every HPC expert must know. It discusses InfiniBand principles like fabric components, architecture, and discovery stages. It also covers protocol layers, Mellanox products, and implementations. The document is meant to educate professionals on InfiniBand fundamentals through topics like switches, adapters, cables, fabric management, and more.
The document compares Ethernet and InfiniBand networking technologies. It summarizes that InfiniBand provides more reliable transport than Ethernet through hardware-based retransmission and CRC checks. It also enables higher performance switching through cut-through routing and larger, lower cost switches compared to Ethernet technologies. InfiniBand further supports reliable direct memory access without TCP/IP overhead.
Design Cloud system: InfiniBand vs. EthernetPatrik Kristel
Comparison between two technologies in Cloud system design. We are focusing to advantages or disadvantages of InfiniBand and Ethernet in current Cloud systems and Data center design.
The document provides an overview of InfiniBand, including its key components and advantages over traditional network protocols. It describes InfiniBand as a high-performance switched fabric interconnect that uses Remote Direct Memory Access to transfer data between nodes with low latency and CPU overhead. The primary elements of an InfiniBand network are outlined as Host Channel Adapters, switches, subnet managers, and optional gateways and routers.
1) InfiniBand is an industry standard channel-based architecture that provides high-speed, low-latency interconnects for distributed computing infrastructures.
2) It combines networks into a unified fabric that collectively routes data between host nodes and network peripherals, reducing required adapters and cables and lowering total cost of ownership.
3) InfiniBand is supported by multiple vendors and is well-suited for applications that require high bandwidth, low latency, and low processor overhead such as high performance computing, databases, and heavily interconnected servers.
InfiniBand is a high-performance switched fabric interconnect technology used in high-performance computing and enterprise data centers to provide high throughput, low latency connections between processor nodes and storage devices. It originated from the merger of the Future I/O and Next Generation I/O specifications in 1999. InfiniBand features include switched fabric topology, direct memory access, quality of service, multiple link speeds, and remote DMA support. It is well-suited as an interconnect technology for storage components, application servers, and connectivity between different layers.
InfiniBand is a high-performance interconnect used in high-performance computing. It uses switched fabric topology to allow for high throughput, low latency, and reliability. InfiniBand cables and components allow for data transfer rates from 2 Gbit/s to 120 Gbit/s. It uses a layered architecture including physical, link, network, and transport layers to reliably transfer data. InfiniBand provides superior performance over Ethernet and supports scalability and fault tolerance making it well-suited for high-performance computing environments.
The document provides an overview of InfiniBand essentials that every HPC expert must know. It discusses InfiniBand principles like fabric components, architecture, and discovery stages. It also covers protocol layers, Mellanox products, and implementations. The document is meant to educate professionals on InfiniBand fundamentals through topics like switches, adapters, cables, fabric management, and more.
The document compares Ethernet and InfiniBand networking technologies. It summarizes that InfiniBand provides more reliable transport than Ethernet through hardware-based retransmission and CRC checks. It also enables higher performance switching through cut-through routing and larger, lower cost switches compared to Ethernet technologies. InfiniBand further supports reliable direct memory access without TCP/IP overhead.
Design Cloud system: InfiniBand vs. EthernetPatrik Kristel
Comparison between two technologies in Cloud system design. We are focusing to advantages or disadvantages of InfiniBand and Ethernet in current Cloud systems and Data center design.
The document provides an overview of InfiniBand, including its key components and advantages over traditional network protocols. It describes InfiniBand as a high-performance switched fabric interconnect that uses Remote Direct Memory Access to transfer data between nodes with low latency and CPU overhead. The primary elements of an InfiniBand network are outlined as Host Channel Adapters, switches, subnet managers, and optional gateways and routers.
1) InfiniBand is an industry standard channel-based architecture that provides high-speed, low-latency interconnects for distributed computing infrastructures.
2) It combines networks into a unified fabric that collectively routes data between host nodes and network peripherals, reducing required adapters and cables and lowering total cost of ownership.
3) InfiniBand is supported by multiple vendors and is well-suited for applications that require high bandwidth, low latency, and low processor overhead such as high performance computing, databases, and heavily interconnected servers.
InfiniBand is a high-performance switched fabric interconnect technology used in high-performance computing and enterprise data centers to provide high throughput, low latency connections between processor nodes and storage devices. It originated from the merger of the Future I/O and Next Generation I/O specifications in 1999. InfiniBand features include switched fabric topology, direct memory access, quality of service, multiple link speeds, and remote DMA support. It is well-suited as an interconnect technology for storage components, application servers, and connectivity between different layers.
InfiniBand is a high-performance interconnect used in high-performance computing. It uses switched fabric topology to allow for high throughput, low latency, and reliability. InfiniBand cables and components allow for data transfer rates from 2 Gbit/s to 120 Gbit/s. It uses a layered architecture including physical, link, network, and transport layers to reliably transfer data. InfiniBand provides superior performance over Ethernet and supports scalability and fault tolerance making it well-suited for high-performance computing environments.
The document discusses how Mellanox storage solutions can maximize data center return on investment through faster database performance, increased virtual machine density per server, and lower total cost of ownership. Mellanox's high-speed interconnect technologies like InfiniBand and RDMA can provide over 10x higher storage performance compared to traditional Ethernet and Fibre Channel solutions.
This document discusses how Mellanox technologies can accelerate big data solutions using RDMA. It summarizes that Mellanox provides end-to-end interconnect solutions including adapters, switches, and cables. It also discusses three key areas for acceleration: data analytics, storage, and distributed storage. The document presents the Unstructured Data Accelerator plugin which can double MapReduce performance using RDMA for efficient data shuffling. It also discusses using RDMA and SSDs to unlock higher throughput in HDFS and overcome bandwidth limitations of 1GbE and 10GbE networks.
Scalable Service-Oriented Middleware over IPDai Yang
ABSTRACT
Due to the increased amount of communication in cars, a reliable and easy to use middleware system for automotive applications becomes a popular research field. In this paper, we review a recent approach: the Scalable Service-Oriented Middleware over IP (SOME/IP). We present current tech- nologies and how SOME/IP differs from them. We point out how SOME/IP is ordered into the ISO/OSI layer model and discuss its service orientation. We also present the ad- vantages and disadvantages of SOME/IP. In the end, we analyze its timing behavior and whether it is suitable for automotive software or not.
The panel discussed IPv6 support in customer edge (CE) routers from various vendors. Each vendor gave a brief introduction of their IPv6 program and products. Key topics discussed included supported IPv6 architectures (native, dual stack, tunneling), reasons for supporting transition mechanisms, thoroughness of native IPv6 support, customer and product types, plans to support new transition technologies and Home Networking, challenges with firmware upgrades, market demand, and areas the IETF could still address. The panel concluded by taking questions from the audience.
Insights on the configuration and performances of SOME/IP Service DiscoveryNicolas Navet
Scalable Service-Oriented Middleware on IP (SOME/IP) is a proposal aimed at providing service-oriented communication in vehicles. SOME/IP nodes are able to dynamically discover and subscribe to available services through the SOME/IP Service Discovery protocol (SOME/IP SD). In this context, a key performance criterion to achieve the required responsiveness is the subscription latency that is the time it takes for a client to subscribe to a service. In this paper we provide a recap of SOME/SD and list a number of assumptions based on what we can foresee about the use of SOME/IP in the automotive domain. Then, we identify the factors having an effect on the subscription latency, and, by sensitivity analysis, quantify their importance regarding the worst-case service subscription latency. The analysis and experiments in this study provide practical insights into how to best configure SOME/IP SD protocol.
Scalable Service Oriented Architecture for Audio/Video ...Videoguy
The document summarizes research on developing a scalable service-oriented architecture for video conferencing using publish/subscribe event brokers to distribute real-time audio and video streams. It outlines the GlobalMMCS architecture, which separates media processing from delivery for scalability. Performance tests showed a single broker could support 1500 audio or 400 video participants, and distributed brokers improved scalability further.
Building efficient 5G NR base stations with Intel® Xeon® Scalable Processors Michelle Holley
Speaker: Daniel Towner, System Architect for Wireless Access, Intel Corporation
5G brings many new capabilities over 4G including higher bandwidths, lower latencies, and more efficient use of radio spectrum. However, these improvements require a large increase in computing power in the base station. Fortunately the Xeon Scalable Processor series (Skylake-SP) recently introduced by Intel has a new high-performance instruction set called Intel® Advanced Vector Extensions 512 (Intel® AVX-512) which is capable of delivering the compute needed to support the exciting new world of 5G.
In his talk Daniel will give an overview of the new capabilities of the Intel AVX-512 instruction set and show why they are so beneficial to supporting 5G efficiently. The most obvious difference is that Intel AVX-512 has double the compute performance of previous generations of instruction sets. Perhaps surprisingly though it is the addition of brand new instructions that can make the biggest improvements. The new instructions mean that software algorithms can become more efficient, thereby enabling even more effective use of the improvements in computing performance and leading to very high performance 5G NR software implementations.
This document discusses Mellanox's Efficient Virtual Network (EVN) solution for service providers. It begins with an overview of Mellanox's end-to-end interconnect solutions and portfolio. It then discusses how the cloud-native NFV architecture requires an efficient virtual network. The EVN is introduced as the foundation for efficient telco cloud infrastructure. The document provides details on how SR-IOV and DPDK can be used together with Mellanox NICs to achieve near line-rate performance without CPU overhead. It also discusses how overlay networks can be accelerated using overlay network accelerators in NICs. Benchmark results show the EVN approach achieving higher performance and lower CPU utilization compared to alternative solutions.
Madhu Rangarajan will provide an overview of Networking trends they are seeing in Cloud, various network topologies and tradeoffs, and trends in the acceleration of packet processing workloads. They will also talk about some of the work going on in Intel to address these trends, including FPGAs in the datacenter.
Tech Tutorial by Vikram Dham: Let's build MPLS router using SDNnvirters
Synopsis
We will start with MPLS 101 and then look into MPLS related OpenFlow actions. In the second half we will delve into RouteFlow architecture and extend it to enable Label Distribution Protocol (LDP) and MPLS routing. We will conclude with a mini-net based test bed switching traffic using MPLS labels instead of IP addresses.
This will be a hands on workshop. VM Images for Virtual Box will be provided. Attendees are expected to bring their laptops loaded with Virtual Box.
About Vikram Dham
Vikram is the CTO and co-founder of Kamboi Technologies, LLC where he advises networking companies, switch vendors and early adopters on SDN technology and distributed software development. Also, he is the founder of Bay Area Network Virtualization (BANV) meet-up group, that brings together technologists in the SDN/NFV/NV domain for technical talks, workshops and creates a truly "open" platform for sharing knowledge.
He has used SDN technologies for building software related to traffic engineering, security and routing. In the past, he was the Principal Engineer at Slingbox where he architected & built the distributed networking software for peer to peer connectivity of millions of end points. He holds MS degree in EE with a specialization in Computer Networks from Virginia Tech and has worked on research projects with companies like ECI Telecom, Raytheon and Avaya Research Labs.
Introduction to SDN and Network Programmability - BRKRST-1014 | 2017/Las VegasBruno Teixeira
Jason Davis, Distinguished Services Engineer , Cisco Software-Defined Networking (SDN) is an exciting new approach to network IT Service Management. If you are trying to understand what SDN is and want to understand more about Controllers, APIs, Overlays, OpenFlow and ACI, then this introductory session is for you! We will cover the genesis of SDN, what it is, what it is not, and Cisco's involvement in this space. You may also be wondering what products and services are SDN-enabled and how you can solve your unique business challenges by enhancing and differentiating your services by leveraging network programmability. Cisco's SDN-enabled Products and Services will be explained enabling you to consider your own implementations. Since SDN extends network flexibility and functionality which impacts Network Engineering and Operations teams, we'll also cover the IT Service Management impact. Finally, we'll explore what skills and capabilities are needed to take advantage of SDN and Network Programmability. Network engineers, network operation staff, IT Service Managers, IT personnel managers, and application/compute SMEs will benefit from this session.
Vector Packet Technologies such as DPDK and FD.io/VPP revolutionized software packet processing initially for discrete appliances and then for NFV use cases. Container based VNF deployments and it's supporting NFV infrastructure is now the new frontier in packet processing and has number of strong advocates among both traditional Comms Service Providers and in the Cloud. This presentation will give an overview of how DPDK and FD.io/VPP project are rising to meet the challenges of the Container dataplane. The discussion will provide an overview of the challenges, recent new features and what is coming soon in this exciting new area for the software dataplane, in both DPDK and FD.io/VPP!
About the speaker: Ray Kinsella has been working on Linux and various other open source technologies for about twenty years. He is recently active in open source communities such as VPP and DPDK but is a constant lurker in many others. He is interested in the software dataplane and optimization, virtualization, operating system design and implementation, communications and networking.
This document provides an overview of Extreme Networks' products for mobility including their network operating system ExtremeXOS, switch and router families like BlackDiamond, Summit, E4G and wireless products. It also discusses their management products like Ridgeline and support services like ExtremeWorks. The document highlights how Extreme Networks helps enable new visions of mobility with intelligent networks that can automatically adapt to conditions and provide personalized experiences.
Development, test, and characterization of MEC platforms with Teranium and Dr...Michelle Holley
Mobile edge computing delivers cloud computing at the edge of the cellular network to drive services quality and innovation. The ability for CSPs and ISVs to effectively develop, deliver, and deploy MEC services on a given platform directly correlates with the availability and maturity of associated tools and test environment. Dronava is a hyper-connected, web-scale network reference design for the 5G mobile network, suitable for use as a test and development socket for cloud applications developed for MEC platforms with tools such as the Intel NEV SDK. With Dronava, developers can drive the application with real traffics from the network edge to the EPC core, and if need be, connect with services in the core network in order to fully characterize the functionalities, latency, and throughput of the platform and application.Teranium is an integrated development environment that simplifies the development, packaging, and deployment/management of cloud applications. Teranium can be utilized to develop and deploy MEC applications on a number of platforms. Together with Dronava, Teranium helps to reduce complexity and improve efficiency in the ability of CSPs and ISVs to adopt and deploy MEC-base services.
Dr. Christos Kolias – Senior Research Scientist
Keynote Title: “NFV: Empowering the Network”
Keynote Abstract: Network Functions Virtualization (NFV) envisions and promises to change the service provider landscape and has emerged as one of one of today’s significant trends. Although less than two years old, NFV has garnered the industry’s full attention and support. Moving swiftly, a number of key accomplishments have already taken place, and a lot more work is currently under way within ETSI NFV while we are embarking on its future phase. Various proofs-of-concepts (ranging from vEPC to vCPE, vIMS and vCDN) are being developed while issues such as open source and SDN are becoming key ingredients as the can play a pivotal role.
Dr. Christos Kolias' Bio: Christos Kolias is a senior research scientist at Orange Silicon Valley (a subsidiary of Orange). Christos is a co-founder of the ETSI NFV group and had led the formation of ONF’s Wireless & Mobile working group. He has lectured on NFV and SDN at several events. Christos has more than 15 years of experience in networking, he is the originator of Virtual Output Queueing (VOQ) used in packet switching. He holds a Ph.D. in Computer Science from UCLA.
---------------------------------------------------
★ Resources ★
Zerista: http://lcu14.zerista.com/event/member/137765
Google Event: https://plus.google.com/u/0/events/cpeksim4hr4ghhuufv5ic4viirs
Video: https://www.youtube.com/watch?v=tFDnj_342n4&list=UUIVqQKxCyQLJS6xvSmfndLA
Etherpad: http://pad.linaro.org/p/lcu14-400a
---------------------------------------------------
★ Event Details ★
Linaro Connect USA - #LCU14
September 15-19th, 2014
Hyatt Regency San Francisco Airport
---------------------------------------------------
http://www.linaro.org
http://connect.linaro.org
Multi-operator "IPC" VPN Slices: Applying RINA to Overlay NetworkingARCFIRE ICT
This document discusses the concept of "IPC VPN slices" which provide distributed inter-process communication (IPC) between applications using the Recursive Internet Network Architecture (RINA). It describes how IPC VPN slices can be implemented across single and multiple domains/operators using RINA's distributed IPC facility (DIF) as an overlay. The objective is to provide an autonomous IPC VPN overlay and separation of concerns between the VPN and underlying L2 VPN fabric, as well as service continuity as endpoints attach across different access networks. It also shows how slice orchestration in this architecture provides recursive abstraction between different administrative domains.
Tech Talk by John Casey (CTO) CPLANE_NETWORKS : High Performance OpenStack Ne...nvirters
OpenStack is HOT! No doubt about it. A recent survey by The New Stack and The Linux Foundation shows OpenStack as the most popular open source project ahead of other hot projects like Docker and KVM. OpenStack is now taking its rightful place as the open source cloud solution for enterprises and service providers.
To date OpenStack networking has not yet achieved the performance, scalability and reliability that many large enterprises demand. CPLANE NETWORKS solves that problem by delivering secure multi-tenant virtual networking that overcomes the limitations of the standard Neutron networking service. By making all networking services local to the compute node and achieving near line-rate throughput, CPLANE NETWORKS Dynamic Virtual Networks (DVN) delivers mega-scale networking for the most demanding application environments.
In this session John Casey will cover the basics of DVN and explain how CPLANE NETWORKS achieves "at scale" network performance within and across data centers.
About John Casey
John Casey has over 20 years of deep technology leadership. His proven success with a variety of technical leadership roles in Telecom, Enterprise and Government and in software design and development provide the foundation for the system architecture and engineering team.
Previously John led worldwide deployment teams for both IBM’s Software Group and Narus, Inc. His work in large scale, high performance system design at Transarc Labs and Walker Interactive Systems brings leadership to the CPLANE NETWORKS product suite.
Install FD.IO VPP On Intel(r) Architecture & Test with Trex*Michelle Holley
This demo/lab will guide you to install and configure FD.io Vector Packet Processing (VPP) on Intel® Architecture (AI) Server. You will also learn to install TRex* on another AI Server to send packets to the VPP, and use some VPP commands to forward packets back to the TRex*.
Speaker: Loc Nguyen. Loc is a Software Application Engineer in Data Center Scale Engineering Team. Loc joined Intel in 2005, and has worked in various projects. Before joining the network group, Loc worked in High-Performance Computing area and supported Intel® Xeon Phi™ Product Family. His interest includes computer graphics, parallel computing, and computer networking.
This document introduces programmable virtual networks and discusses their advantages over traditional network slicing. It describes FlowVisor, an early network slicing tool, and its limitations in providing full network virtualization. The document then introduces OpenVirteX, a new system that aims to provide complete programmable virtual networks through topology, address, and policy virtualization. OpenVirteX maps virtual and physical network elements and allows independent control of virtual networks. While still in development, OpenVirteX has the potential to enable more flexible and innovative virtualized networks than previous solutions.
Intel® Ethernet Series Delivering Real-World Value. As computing and networking scale in performance, interconnect technologies play a critical role in ensuring systems reach their full potential in the speed at which they move data. Intel has been at the forefront of research and development into interconnect technologies since the dawn of the PC era. Today in the data center, Intel is working to deliver greater levels of intelligence within its connectivity solutions to overcome network bottlenecks and accelerate applications. Between PC and peripherals, Intel is heavily involved with the industry as it brings the latest technologies to market for the best user experiences. At the chip level, Intel is leading the industry in advanced packaging with technologies that connect chiplets and modules in order to deliver Moore’s Law advances, while also working to reduce latency between memory and CPU. From “Microns to Miles,” Intel’s investments in interconnect technologies are among the broadest in the industry.
Google and Intel speak on NFV and SFC service delivery
The slides are as presented at the meet up "Out of Box Network Developers" sponsored by Intel Networking Developer Zone
Here is the Agenda of the slides:
How DPDK, RDT and gRPC fit into SDI/SDN, NFV and OpenStack
Key Platform Requirements for SDI
SDI Platform Ingredients: DPDK, IntelⓇRDT
gRPC Service Framework
IntelⓇ RDT and gRPC service framework
The document discusses how Mellanox storage solutions can maximize data center return on investment through faster database performance, increased virtual machine density per server, and lower total cost of ownership. Mellanox's high-speed interconnect technologies like InfiniBand and RDMA can provide over 10x higher storage performance compared to traditional Ethernet and Fibre Channel solutions.
This document discusses how Mellanox technologies can accelerate big data solutions using RDMA. It summarizes that Mellanox provides end-to-end interconnect solutions including adapters, switches, and cables. It also discusses three key areas for acceleration: data analytics, storage, and distributed storage. The document presents the Unstructured Data Accelerator plugin which can double MapReduce performance using RDMA for efficient data shuffling. It also discusses using RDMA and SSDs to unlock higher throughput in HDFS and overcome bandwidth limitations of 1GbE and 10GbE networks.
Scalable Service-Oriented Middleware over IPDai Yang
ABSTRACT
Due to the increased amount of communication in cars, a reliable and easy to use middleware system for automotive applications becomes a popular research field. In this paper, we review a recent approach: the Scalable Service-Oriented Middleware over IP (SOME/IP). We present current tech- nologies and how SOME/IP differs from them. We point out how SOME/IP is ordered into the ISO/OSI layer model and discuss its service orientation. We also present the ad- vantages and disadvantages of SOME/IP. In the end, we analyze its timing behavior and whether it is suitable for automotive software or not.
The panel discussed IPv6 support in customer edge (CE) routers from various vendors. Each vendor gave a brief introduction of their IPv6 program and products. Key topics discussed included supported IPv6 architectures (native, dual stack, tunneling), reasons for supporting transition mechanisms, thoroughness of native IPv6 support, customer and product types, plans to support new transition technologies and Home Networking, challenges with firmware upgrades, market demand, and areas the IETF could still address. The panel concluded by taking questions from the audience.
Insights on the configuration and performances of SOME/IP Service DiscoveryNicolas Navet
Scalable Service-Oriented Middleware on IP (SOME/IP) is a proposal aimed at providing service-oriented communication in vehicles. SOME/IP nodes are able to dynamically discover and subscribe to available services through the SOME/IP Service Discovery protocol (SOME/IP SD). In this context, a key performance criterion to achieve the required responsiveness is the subscription latency that is the time it takes for a client to subscribe to a service. In this paper we provide a recap of SOME/SD and list a number of assumptions based on what we can foresee about the use of SOME/IP in the automotive domain. Then, we identify the factors having an effect on the subscription latency, and, by sensitivity analysis, quantify their importance regarding the worst-case service subscription latency. The analysis and experiments in this study provide practical insights into how to best configure SOME/IP SD protocol.
Scalable Service Oriented Architecture for Audio/Video ...Videoguy
The document summarizes research on developing a scalable service-oriented architecture for video conferencing using publish/subscribe event brokers to distribute real-time audio and video streams. It outlines the GlobalMMCS architecture, which separates media processing from delivery for scalability. Performance tests showed a single broker could support 1500 audio or 400 video participants, and distributed brokers improved scalability further.
Building efficient 5G NR base stations with Intel® Xeon® Scalable Processors Michelle Holley
Speaker: Daniel Towner, System Architect for Wireless Access, Intel Corporation
5G brings many new capabilities over 4G including higher bandwidths, lower latencies, and more efficient use of radio spectrum. However, these improvements require a large increase in computing power in the base station. Fortunately the Xeon Scalable Processor series (Skylake-SP) recently introduced by Intel has a new high-performance instruction set called Intel® Advanced Vector Extensions 512 (Intel® AVX-512) which is capable of delivering the compute needed to support the exciting new world of 5G.
In his talk Daniel will give an overview of the new capabilities of the Intel AVX-512 instruction set and show why they are so beneficial to supporting 5G efficiently. The most obvious difference is that Intel AVX-512 has double the compute performance of previous generations of instruction sets. Perhaps surprisingly though it is the addition of brand new instructions that can make the biggest improvements. The new instructions mean that software algorithms can become more efficient, thereby enabling even more effective use of the improvements in computing performance and leading to very high performance 5G NR software implementations.
This document discusses Mellanox's Efficient Virtual Network (EVN) solution for service providers. It begins with an overview of Mellanox's end-to-end interconnect solutions and portfolio. It then discusses how the cloud-native NFV architecture requires an efficient virtual network. The EVN is introduced as the foundation for efficient telco cloud infrastructure. The document provides details on how SR-IOV and DPDK can be used together with Mellanox NICs to achieve near line-rate performance without CPU overhead. It also discusses how overlay networks can be accelerated using overlay network accelerators in NICs. Benchmark results show the EVN approach achieving higher performance and lower CPU utilization compared to alternative solutions.
Madhu Rangarajan will provide an overview of Networking trends they are seeing in Cloud, various network topologies and tradeoffs, and trends in the acceleration of packet processing workloads. They will also talk about some of the work going on in Intel to address these trends, including FPGAs in the datacenter.
Tech Tutorial by Vikram Dham: Let's build MPLS router using SDNnvirters
Synopsis
We will start with MPLS 101 and then look into MPLS related OpenFlow actions. In the second half we will delve into RouteFlow architecture and extend it to enable Label Distribution Protocol (LDP) and MPLS routing. We will conclude with a mini-net based test bed switching traffic using MPLS labels instead of IP addresses.
This will be a hands on workshop. VM Images for Virtual Box will be provided. Attendees are expected to bring their laptops loaded with Virtual Box.
About Vikram Dham
Vikram is the CTO and co-founder of Kamboi Technologies, LLC where he advises networking companies, switch vendors and early adopters on SDN technology and distributed software development. Also, he is the founder of Bay Area Network Virtualization (BANV) meet-up group, that brings together technologists in the SDN/NFV/NV domain for technical talks, workshops and creates a truly "open" platform for sharing knowledge.
He has used SDN technologies for building software related to traffic engineering, security and routing. In the past, he was the Principal Engineer at Slingbox where he architected & built the distributed networking software for peer to peer connectivity of millions of end points. He holds MS degree in EE with a specialization in Computer Networks from Virginia Tech and has worked on research projects with companies like ECI Telecom, Raytheon and Avaya Research Labs.
Introduction to SDN and Network Programmability - BRKRST-1014 | 2017/Las VegasBruno Teixeira
Jason Davis, Distinguished Services Engineer , Cisco Software-Defined Networking (SDN) is an exciting new approach to network IT Service Management. If you are trying to understand what SDN is and want to understand more about Controllers, APIs, Overlays, OpenFlow and ACI, then this introductory session is for you! We will cover the genesis of SDN, what it is, what it is not, and Cisco's involvement in this space. You may also be wondering what products and services are SDN-enabled and how you can solve your unique business challenges by enhancing and differentiating your services by leveraging network programmability. Cisco's SDN-enabled Products and Services will be explained enabling you to consider your own implementations. Since SDN extends network flexibility and functionality which impacts Network Engineering and Operations teams, we'll also cover the IT Service Management impact. Finally, we'll explore what skills and capabilities are needed to take advantage of SDN and Network Programmability. Network engineers, network operation staff, IT Service Managers, IT personnel managers, and application/compute SMEs will benefit from this session.
Vector Packet Technologies such as DPDK and FD.io/VPP revolutionized software packet processing initially for discrete appliances and then for NFV use cases. Container based VNF deployments and it's supporting NFV infrastructure is now the new frontier in packet processing and has number of strong advocates among both traditional Comms Service Providers and in the Cloud. This presentation will give an overview of how DPDK and FD.io/VPP project are rising to meet the challenges of the Container dataplane. The discussion will provide an overview of the challenges, recent new features and what is coming soon in this exciting new area for the software dataplane, in both DPDK and FD.io/VPP!
About the speaker: Ray Kinsella has been working on Linux and various other open source technologies for about twenty years. He is recently active in open source communities such as VPP and DPDK but is a constant lurker in many others. He is interested in the software dataplane and optimization, virtualization, operating system design and implementation, communications and networking.
This document provides an overview of Extreme Networks' products for mobility including their network operating system ExtremeXOS, switch and router families like BlackDiamond, Summit, E4G and wireless products. It also discusses their management products like Ridgeline and support services like ExtremeWorks. The document highlights how Extreme Networks helps enable new visions of mobility with intelligent networks that can automatically adapt to conditions and provide personalized experiences.
Development, test, and characterization of MEC platforms with Teranium and Dr...Michelle Holley
Mobile edge computing delivers cloud computing at the edge of the cellular network to drive services quality and innovation. The ability for CSPs and ISVs to effectively develop, deliver, and deploy MEC services on a given platform directly correlates with the availability and maturity of associated tools and test environment. Dronava is a hyper-connected, web-scale network reference design for the 5G mobile network, suitable for use as a test and development socket for cloud applications developed for MEC platforms with tools such as the Intel NEV SDK. With Dronava, developers can drive the application with real traffics from the network edge to the EPC core, and if need be, connect with services in the core network in order to fully characterize the functionalities, latency, and throughput of the platform and application.Teranium is an integrated development environment that simplifies the development, packaging, and deployment/management of cloud applications. Teranium can be utilized to develop and deploy MEC applications on a number of platforms. Together with Dronava, Teranium helps to reduce complexity and improve efficiency in the ability of CSPs and ISVs to adopt and deploy MEC-base services.
Dr. Christos Kolias – Senior Research Scientist
Keynote Title: “NFV: Empowering the Network”
Keynote Abstract: Network Functions Virtualization (NFV) envisions and promises to change the service provider landscape and has emerged as one of one of today’s significant trends. Although less than two years old, NFV has garnered the industry’s full attention and support. Moving swiftly, a number of key accomplishments have already taken place, and a lot more work is currently under way within ETSI NFV while we are embarking on its future phase. Various proofs-of-concepts (ranging from vEPC to vCPE, vIMS and vCDN) are being developed while issues such as open source and SDN are becoming key ingredients as the can play a pivotal role.
Dr. Christos Kolias' Bio: Christos Kolias is a senior research scientist at Orange Silicon Valley (a subsidiary of Orange). Christos is a co-founder of the ETSI NFV group and had led the formation of ONF’s Wireless & Mobile working group. He has lectured on NFV and SDN at several events. Christos has more than 15 years of experience in networking, he is the originator of Virtual Output Queueing (VOQ) used in packet switching. He holds a Ph.D. in Computer Science from UCLA.
---------------------------------------------------
★ Resources ★
Zerista: http://lcu14.zerista.com/event/member/137765
Google Event: https://plus.google.com/u/0/events/cpeksim4hr4ghhuufv5ic4viirs
Video: https://www.youtube.com/watch?v=tFDnj_342n4&list=UUIVqQKxCyQLJS6xvSmfndLA
Etherpad: http://pad.linaro.org/p/lcu14-400a
---------------------------------------------------
★ Event Details ★
Linaro Connect USA - #LCU14
September 15-19th, 2014
Hyatt Regency San Francisco Airport
---------------------------------------------------
http://www.linaro.org
http://connect.linaro.org
Multi-operator "IPC" VPN Slices: Applying RINA to Overlay NetworkingARCFIRE ICT
This document discusses the concept of "IPC VPN slices" which provide distributed inter-process communication (IPC) between applications using the Recursive Internet Network Architecture (RINA). It describes how IPC VPN slices can be implemented across single and multiple domains/operators using RINA's distributed IPC facility (DIF) as an overlay. The objective is to provide an autonomous IPC VPN overlay and separation of concerns between the VPN and underlying L2 VPN fabric, as well as service continuity as endpoints attach across different access networks. It also shows how slice orchestration in this architecture provides recursive abstraction between different administrative domains.
Tech Talk by John Casey (CTO) CPLANE_NETWORKS : High Performance OpenStack Ne...nvirters
OpenStack is HOT! No doubt about it. A recent survey by The New Stack and The Linux Foundation shows OpenStack as the most popular open source project ahead of other hot projects like Docker and KVM. OpenStack is now taking its rightful place as the open source cloud solution for enterprises and service providers.
To date OpenStack networking has not yet achieved the performance, scalability and reliability that many large enterprises demand. CPLANE NETWORKS solves that problem by delivering secure multi-tenant virtual networking that overcomes the limitations of the standard Neutron networking service. By making all networking services local to the compute node and achieving near line-rate throughput, CPLANE NETWORKS Dynamic Virtual Networks (DVN) delivers mega-scale networking for the most demanding application environments.
In this session John Casey will cover the basics of DVN and explain how CPLANE NETWORKS achieves "at scale" network performance within and across data centers.
About John Casey
John Casey has over 20 years of deep technology leadership. His proven success with a variety of technical leadership roles in Telecom, Enterprise and Government and in software design and development provide the foundation for the system architecture and engineering team.
Previously John led worldwide deployment teams for both IBM’s Software Group and Narus, Inc. His work in large scale, high performance system design at Transarc Labs and Walker Interactive Systems brings leadership to the CPLANE NETWORKS product suite.
Install FD.IO VPP On Intel(r) Architecture & Test with Trex*Michelle Holley
This demo/lab will guide you to install and configure FD.io Vector Packet Processing (VPP) on Intel® Architecture (AI) Server. You will also learn to install TRex* on another AI Server to send packets to the VPP, and use some VPP commands to forward packets back to the TRex*.
Speaker: Loc Nguyen. Loc is a Software Application Engineer in Data Center Scale Engineering Team. Loc joined Intel in 2005, and has worked in various projects. Before joining the network group, Loc worked in High-Performance Computing area and supported Intel® Xeon Phi™ Product Family. His interest includes computer graphics, parallel computing, and computer networking.
This document introduces programmable virtual networks and discusses their advantages over traditional network slicing. It describes FlowVisor, an early network slicing tool, and its limitations in providing full network virtualization. The document then introduces OpenVirteX, a new system that aims to provide complete programmable virtual networks through topology, address, and policy virtualization. OpenVirteX maps virtual and physical network elements and allows independent control of virtual networks. While still in development, OpenVirteX has the potential to enable more flexible and innovative virtualized networks than previous solutions.
Intel® Ethernet Series Delivering Real-World Value. As computing and networking scale in performance, interconnect technologies play a critical role in ensuring systems reach their full potential in the speed at which they move data. Intel has been at the forefront of research and development into interconnect technologies since the dawn of the PC era. Today in the data center, Intel is working to deliver greater levels of intelligence within its connectivity solutions to overcome network bottlenecks and accelerate applications. Between PC and peripherals, Intel is heavily involved with the industry as it brings the latest technologies to market for the best user experiences. At the chip level, Intel is leading the industry in advanced packaging with technologies that connect chiplets and modules in order to deliver Moore’s Law advances, while also working to reduce latency between memory and CPU. From “Microns to Miles,” Intel’s investments in interconnect technologies are among the broadest in the industry.
Google and Intel speak on NFV and SFC service delivery
The slides are as presented at the meet up "Out of Box Network Developers" sponsored by Intel Networking Developer Zone
Here is the Agenda of the slides:
How DPDK, RDT and gRPC fit into SDI/SDN, NFV and OpenStack
Key Platform Requirements for SDI
SDI Platform Ingredients: DPDK, IntelⓇRDT
gRPC Service Framework
IntelⓇ RDT and gRPC service framework
This document is the first chapter of a Cisco training course on campus network design. It introduces common campus network architectures and best practices for design. The chapter discusses the access, distribution and core layers, and considers designs for small, medium and large networks. It also outlines the PPDIOO methodology for the network lifecycle and emphasizes the importance of careful planning based on a hierarchical design to support business needs during network evolution. The chapter concludes with two sample lab exercises.
Intels presentation at blue line industrial computer seminarBlue Line
This document provides an overview of Intel Corporation in 2014. It discusses Intel's mission to bring smart, connected devices to everyone using Moore's Law. Over 75% of Intel's business is outside the US, with key focus areas including data center, client, ultra-mobile, and wearables/IoT. Intel has a track record of executing Moore's Law and developing new process technologies like 14nm. The document outlines Intel's various business groups and labs focusing on areas like the internet of things. It provides a roadmap for Intel gateways for IoT and discusses Intel's history and position as the world's largest semiconductor manufacturer.
The document discusses an open networking approach called Openet that provides programmability of network devices. Openet allows for the deployment of active services on commercial network hardware through a programmable platform. It explores using the Alteon 780 series switch as a research platform, which provides L2-L7 filtering and powerful computation through integrated service delivery systems. The goal is to embed intelligence and computation within the network to enable new applications and services.
The number of internet-connected devices is growing exponentially, enabling an increasing number of edge applications in environments such as smart cities, retail, and industry 4.0. These intelligent solutions often require processing large amounts of data, running models to enable image recognition, predictive analytics, autonomous systems, and more. Increasing system workloads and data processing capacity at the edge is essential to minimize latency, improve responsiveness, and reduce network traffic back to data centers. Purpose-built systems such as Supermicro’s short-depth, multi-node SuperEdge, powered by 3rd Gen Intel® Xeon® Scalable processors, increase compute and I/O density at the edge and enable businesses to further accelerate innovation.
Join this webinar to discover new insights in edge-to-cloud infrastructures and learn how Supermicro SuperEdge multi-node solutions leverage data center scale, performance, and efficiency for 5G, IoT, and Edge applications.
Colt SDN Strategy - FIBRE Workshop 5 Nov 2013 BarcelonaJavier Benitez
Colt's vision is to integrate IT and network platforms through software-defined networking (SDN) and network functions virtualization (NFV) to deliver an integrated customer experience. Colt's strategy is to make networks programmable like computing through SDN/NFV. This includes separating the data and control planes, standardizing network abstractions, and virtualizing network functions. Colt's plan is to develop an SDN/NFV infrastructure including DC network virtualization, virtualized network functions, and a software-defined WAN.
The document discusses campus network design. It describes the common layers of campus networks - access, distribution and core layers. It also discusses small, medium and large campus network designs. The document introduces the PPDIOO (Prepare, Plan, Design, Implement, Operate, Optimize) methodology for network lifecycle management and design. It provides details on the different phases and benefits of the PPDIOO approach.
Get ready to dive into the exciting world of IoT data processing! 🌐📊
Join us for a thought-provoking webinar on "Processing: Turning IoT Data into Intelligence" hosted by industry visionary Deepak Shankar, founder of Mirabilis Design. Discover how to harness the potential of IoT devices by strategically choosing processors that optimize power, performance, and space.
In this engaging session, you'll explore key insights:
✅ Impact of processor architecture on Power-Performance-Area optimization
✅ Enabling AI and ML algorithms through precise compute and storage requirements
✅ Future trends in IoT hardware innovation
✅ Strategies for extending battery life and cost prediction through system design
Don't miss the chance to learn how to leverage a single IoT Edge processor for multiple applications and much more. This is your opportunity to gain a competitive edge in the evolving IoT landscape.
The document discusses network design concepts for building a resilient network. It emphasizes the importance of considering redundancy at multiple levels, from the physical infrastructure to network protocols. Well-designed networks are modular, have clearly defined functional layers, and incorporate redundancy through techniques like load balancing and diverse circuit paths. Hierarchical network designs with logical areas can also improve convergence times during failures.
Task allocation on many core-multi processor distributed systemDeepak Shankar
Migration of software from a single to multi-core, single to multi-thread, and integrated into a distributed system requires a knowledge of the system and scheduling algorithms. The system consists of a combination of hardware, RTOS, network, and traffic profiles. Of the 100+ popular scheduling algorithms, the majority use First Come-First Server with priority and preemption, Weight Round Robin, and Slot-based. The task allocation must take into consideration a number of factors including the hardware configuration, the RTOS scheduling, task dependency, parallel partitioning, shared resources, and memory access. Additionally, embedded system architectures always have the possibility of using custom hardware to implement tasks that may be associated with Artificial Intelligence, diagnostic or image processing.
In this Webinar, we will show you how to conduct trade-offs using a system model of the tasks and the target resources. You will learn to make decisions based on the hardware and network statistics. The statistics will assist in identifying deadlocks, bottlenecks, possible failures and hardware requirements. To estimate the best task allocation and partitioning, a discrete-event simulation with both time- and quantity-shared resource modeling is essential. The software must be defined as a UML or a task graph.
Web: www.mirabilisdesign.com
Webinar Youtube Link: https://youtu.be/ZrV39SYTWSc
The document provides information about building a small network including devices, applications, protocols, and connectivity verification. It discusses selecting devices for a small network based on factors like cost and speed. Common network applications and protocols used in small networks are also identified, including protocols for real-time voice and video. The document explains how a small network design can scale to support larger networks as business needs grow. Methods for verifying connectivity between devices using commands like ping and traceroute are presented. Finally, commands for viewing host IP configurations on Windows and Linux systems are covered.
The document provides information about building a small network including devices, applications, protocols, and connectivity verification. It discusses [1] selecting common devices for a small network like routers, switches, and end devices, [2] applications and protocols used in small networks such as HTTP, SMTP, and DHCP, and [3] using the ping and traceroute commands to verify connectivity between devices and troubleshoot connectivity issues.
Tackling Retail Technology Management Challenges at the EdgeRebekah Rodriguez
As the adoption of intelligent applications in the Retail industry grows, so do their technology requirements. This creates challenges for store operators to navigate the deployment and maintenance of hardware, applications, and management tools at locations without a dedicated IT staff. Along with the complexity of solutions, these operators are dealing with a wide range of installation scenarios, specific to the products or services they offer.
Purpose-built edge systems, such as Supermicro’s Fanless servers powered by Intel® Xeon D® processors, provide a secure and rugged platform that can be deployed where conventional servers cannot. These systems, along with our broad portfolio of short-depth rackmount systems, can be combined with the Reliant Platform by Acumera, a secure, cost-effective, and central cloud managed solution to automate delivery and management of applications, networking, and security controls either in-store or in the cloud.
Join this webinar to hear how these solutions are currently employed across thousands of locations, simplifying Edge IT for many major retailers today. Speakers will include David Nielsen, Sr. System Product Manager for IoT and Edge Applications, Richard Newman and Brett Stewart, Technology Leaders at Acumera, as well as Craig Carter, Product Line Manager from Intel to discuss the latest generation of the Xeon D platform.
PacketCloud: an Open Platform for Elastic In-network Services. yeung2000
This document proposes PacketCloud, an open platform for hosting elastic in-network services. PacketCloud uses cloudlets located at ISP network edges to provide virtual instances for third-party services. These services can be user-requested or transparently intercept traffic. A prototype demonstrates services like encryption achieving over 500Mbps on one node and over 10Gbps across 20 nodes in a cloudlet with minimal delay. The platform aims to efficiently share network resources while providing economic rewards for ISPs and third parties.
Developers’ mDay u Banjoj Luci - Janko Isidorović, Mainflux – Unified IoT Pl...mCloud
This document provides information on unified IoT platforms and discusses Mainflux, an open source IoT platform. It begins with an overview of IoT devices, edge computing, on-premise deployment and cloud deployment challenges. It emphasizes the importance of a unified architecture to reduce costs and complexity. The document then describes Mainflux, highlighting its use of microservices and ability to deploy on various hardware from constrained devices to the cloud. It discusses how Mainflux addresses issues like scalability, security and support for multiple protocols and databases.
This document discusses Internet of Things (IoT) networking standards that enable IoT devices to connect using IPv6. It covers protocols like 6LoWPAN that allow IPv6 to be used over low-power wireless networks, as well as standards for routing (RPL), security (DTLS, CoAP), and device management (LWM2M, SUIT). It also mentions low-power wide area network technologies (LPWANs) and the Thread mesh networking standard that are important for connecting many IoT devices using IPv6.
A session in the DevNet Zone at Cisco Live, Berlin. Flare allows users with mobile devices to discover and interact with things in an environment. It combines multiple location technologies, such as iBeacon and CMX, with a realtime communications architecture to enable new kinds of user interactions. This session will introduce the Flare REST and Socket.IO API, server, client libraries and sample code, and introduce you to the resources available on DevNet and GitHub. Come visit us in the DevNet zone for a hands-on demonstration.
The document discusses Cisco's Field Area Network solution including:
- Multi-service connectivity using smart endpoints, fog computing, security, management, and standards
- The DevNet and Solution Partner Program focusing on technology, partner stories, and participation levels
- Cisco's approach to enabling IoT applications including open standards, security, management, and providing application capabilities at the network edge
Similar to Designing Cloud and Grid Computing Systems with InfiniBand and High-Speed Ethernet (20)
The document discusses VXLAN BGP EVPN based multi-pod, multi-fabric, and multi-site network architectures. It provides an overview of the evolution from single-pod/fabric deployments to more complex multi-site designs. Key concepts covered include hierarchical overlay domains, isolated underlay domains, and border gateways used to interconnect sites while maintaining underlay isolation.
OVN is an open source virtual network solution for Open vSwitch that provides logical L2 and L3 networking, including logical switches, routers, security groups, and multiple tunneling protocols. It is designed to scale to thousands of hypervisors and VMs, improve performance over existing plugins, and integrate with OpenStack and other cloud management systems through its databases and daemons. OVN aims to become the default virtual network solution in OpenStack Neutron by replacing the existing OVS plugin.
08 sdn system intelligence short public beijing sdn conference - 130828Mason Mei
This document discusses software defined networking (SDN) and IBM's SDN strategy. It introduces IBM's SDN Virtual Environment (SDN-VE) platform, which uses Distributed Overlay Virtual Ethernet (DOVE) technology to virtualize the physical network and provide automated connectivity for virtual workloads. SDN-VE integrates with OpenStack and IBM's SmartCloud solutions. It also discusses how SDN can address client requirements through dynamic virtual system provisioning, workload-aware networking, and simplified scalability of servers, storage and networks.
The OpenFlow Conformance Test Program is currently under development and facing several barriers. Key elements like the test specification and reference test code are being developed but are behind schedule. Testing for the 1.3 specification currently lacks funding. Barriers include a lack of contributions from volunteers as testing increases in complexity, and an unsustainable business model. Possible steps to address these issues include catching development up, increasing visibility of the program, and developing a long-term strategic vision and viable business model with funding from ONF members and other interested parties.
This document discusses cloud data center network architectures and how to scale them using Arista switches. It describes the limitations of legacy data center designs and introduces the cloud networking model. The cloud networking model with Arista switches provides benefits like lower latency, no oversubscription between racks, and the ability to scale to hundreds of racks. The document then discusses how to scale the network using layer 2, layer 3, and VXLAN designs from thousands to over a million nodes. It provides examples of scaling the number of leaf and spine switches to achieve greater node counts in a non-blocking two-tier design.
This document discusses how Amazon Web Services (AWS) can be used to build scalable game infrastructure. It recommends using AWS services like Elastic Load Balancing, Amazon EC2, Amazon RDS, Amazon S3, CloudFront, Auto Scaling, ElastiCache and DynamoDB to allow games to automatically scale capacity based on player demand. It also discusses global deployment across AWS regions and using VPC networking and placement groups to optimize game server performance.
Atf 3 q15-8 - introducing macro-segementationMason Mei
This document introduces Arista's Macro-Segmentation technology. It defines micro-segmentation as fine-grained security policies within a virtual switch, while macro-segmentation inserts services between workgroups in the physical network. Arista's solution utilizes CloudVision to automate security service insertion across physical and virtual networks in a coordinated way, providing security for east-west traffic without proprietary requirements. It scales out firewall deployment rather than requiring larger centralized devices.
This document announces dates for technical forums to be held in various European and African cities from June through December. It thanks attendees for joining and encourages inviting colleagues to the spring events. Dates are provided for London, Stockholm, Paris, Frankfurt, Zurich, Johannesburg, CapeTown, Milan, Utrecht and Madrid, with dates still to be confirmed for Warsaw, Moscow and Dublin.
Atf 3 q15-7 - delivering cloud scale workflow automation control and visibili...Mason Mei
1. CloudVision Portal provides cloud scale workflow automation and control/visibility through Arista EOS devices, workflow automation services, and the CloudVision framework including management/monitoring tools, orchestrators, and overlay controllers using open RESTful APIs.
2. CloudVision Portal is provided as an OVA virtual appliance running on a CENTOS Linux platform with Apache Hadoop, HBase database, Hazelcast in-memory database, and Zookeeper, and an Apache Tomcat HTML 5 front-end for a highly extensible user interface.
3. CloudVision provides bootstrap automation through zero touch provisioning, configuration automation using containers to define roles and customize taxonomy for groups of switches and configlets, and operations visibility
Atf 3 q15-4 - scaling the the software driven cloud networkMason Mei
This document discusses network virtualization and the Arista CloudVision eXchange (CVX) platform. It provides 3 key points:
1. Network virtualization using VXLAN allows for any-to-any Layer 2 connectivity across Layer 3 subnets, enabling VM mobility. The CVX platform provides automation of VXLAN deployment without a controller.
2. CVX acts as a single point of integration and provisioning for the physical network. It aggregates network state from EOS switches and presents it to controllers through open APIs. This provides visibility, simplifies provisioning, and improves scalability of controller integration.
3. CVX services include providing the physical topology database, distributing VXLAN configuration
“An Outlook of the Ongoing and Future Relationship between Blockchain Technologies and Process-aware Information Systems.” Invited talk at the joint workshop on Blockchain for Information Systems (BC4IS) and Blockchain for Trusted Data Sharing (B4TDS), co-located with with the 36th International Conference on Advanced Information Systems Engineering (CAiSE), 3 June 2024, Limassol, Cyprus.
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionAggregage
Join Maher Hanafi, VP of Engineering at Betterworks, in this new session where he'll share a practical framework to transform Gen AI prototypes into impactful products! He'll delve into the complexities of data collection and management, model selection and optimization, and ensuring security, scalability, and responsible use.
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
Removing Uninteresting Bytes in Software FuzzingAftab Hussain
Imagine a world where software fuzzing, the process of mutating bytes in test seeds to uncover hidden and erroneous program behaviors, becomes faster and more effective. A lot depends on the initial seeds, which can significantly dictate the trajectory of a fuzzing campaign, particularly in terms of how long it takes to uncover interesting behaviour in your code. We introduce DIAR, a technique designed to speedup fuzzing campaigns by pinpointing and eliminating those uninteresting bytes in the seeds. Picture this: instead of wasting valuable resources on meaningless mutations in large, bloated seeds, DIAR removes the unnecessary bytes, streamlining the entire process.
In this work, we equipped AFL, a popular fuzzer, with DIAR and examined two critical Linux libraries -- Libxml's xmllint, a tool for parsing xml documents, and Binutil's readelf, an essential debugging and security analysis command-line tool used to display detailed information about ELF (Executable and Linkable Format). Our preliminary results show that AFL+DIAR does not only discover new paths more quickly but also achieves higher coverage overall. This work thus showcases how starting with lean and optimized seeds can lead to faster, more comprehensive fuzzing campaigns -- and DIAR helps you find such seeds.
- These are slides of the talk given at IEEE International Conference on Software Testing Verification and Validation Workshop, ICSTW 2022.
Maruthi Prithivirajan, Head of ASEAN & IN Solution Architecture, Neo4j
Get an inside look at the latest Neo4j innovations that enable relationship-driven intelligence at scale. Learn more about the newest cloud integrations and product enhancements that make Neo4j an essential choice for developers building apps with interconnected data and generative AI.
Securing your Kubernetes cluster_ a step-by-step guide to success !KatiaHIMEUR1
Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster.
However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks.
In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...SOFTTECHHUB
The choice of an operating system plays a pivotal role in shaping our computing experience. For decades, Microsoft's Windows has dominated the market, offering a familiar and widely adopted platform for personal and professional use. However, as technological advancements continue to push the boundaries of innovation, alternative operating systems have emerged, challenging the status quo and offering users a fresh perspective on computing.
One such alternative that has garnered significant attention and acclaim is Nitrux Linux 3.5.0, a sleek, powerful, and user-friendly Linux distribution that promises to redefine the way we interact with our devices. With its focus on performance, security, and customization, Nitrux Linux presents a compelling case for those seeking to break free from the constraints of proprietary software and embrace the freedom and flexibility of open-source computing.
For the full video of this presentation, please visit: https://www.edge-ai-vision.com/2024/06/building-and-scaling-ai-applications-with-the-nx-ai-manager-a-presentation-from-network-optix/
Robin van Emden, Senior Director of Data Science at Network Optix, presents the “Building and Scaling AI Applications with the Nx AI Manager,” tutorial at the May 2024 Embedded Vision Summit.
In this presentation, van Emden covers the basics of scaling edge AI solutions using the Nx tool kit. He emphasizes the process of developing AI models and deploying them globally. He also showcases the conversion of AI models and the creation of effective edge AI pipelines, with a focus on pre-processing, model conversion, selecting the appropriate inference engine for the target hardware and post-processing.
van Emden shows how Nx can simplify the developer’s life and facilitate a rapid transition from concept to production-ready applications.He provides valuable insights into developing scalable and efficient edge AI solutions, with a strong focus on practical implementation.
20 Comprehensive Checklist of Designing and Developing a WebsitePixlogix Infotech
Dive into the world of Website Designing and Developing with Pixlogix! Looking to create a stunning online presence? Look no further! Our comprehensive checklist covers everything you need to know to craft a website that stands out. From user-friendly design to seamless functionality, we've got you covered. Don't miss out on this invaluable resource! Check out our checklist now at Pixlogix and start your journey towards a captivating online presence today.
Climate Impact of Software Testing at Nordic Testing DaysKari Kakkonen
My slides at Nordic Testing Days 6.6.2024
Climate impact / sustainability of software testing discussed on the talk. ICT and testing must carry their part of global responsibility to help with the climat warming. We can minimize the carbon footprint but we can also have a carbon handprint, a positive impact on the climate. Quality characteristics can be added with sustainability, and then measured continuously. Test environments can be used less, and in smaller scale and on demand. Test techniques can be used in optimizing or minimizing number of tests. Test automation can be used to speed up testing.
A tale of scale & speed: How the US Navy is enabling software delivery from l...sonjaschweigert1
Rapid and secure feature delivery is a goal across every application team and every branch of the DoD. The Navy’s DevSecOps platform, Party Barge, has achieved:
- Reduction in onboarding time from 5 weeks to 1 day
- Improved developer experience and productivity through actionable findings and reduction of false positives
- Maintenance of superior security standards and inherent policy enforcement with Authorization to Operate (ATO)
Development teams can ship efficiently and ensure applications are cyber ready for Navy Authorizing Officials (AOs). In this webinar, Sigma Defense and Anchore will give attendees a look behind the scenes and demo secure pipeline automation and security artifacts that speed up application ATO and time to production.
We will cover:
- How to remove silos in DevSecOps
- How to build efficient development pipeline roles and component templates
- How to deliver security artifacts that matter for ATO’s (SBOMs, vulnerability reports, and policy evidence)
- How to streamline operations with automated policy checks on container images
Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...Zilliz
Join us to introduce Milvus Lite, a vector database that can run on notebooks and laptops, share the same API with Milvus, and integrate with every popular GenAI framework. This webinar is perfect for developers seeking easy-to-use, well-integrated vector databases for their GenAI apps.
UiPath Test Automation using UiPath Test Suite series, part 6DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 6. In this session, we will cover Test Automation with generative AI and Open AI.
UiPath Test Automation with generative AI and Open AI webinar offers an in-depth exploration of leveraging cutting-edge technologies for test automation within the UiPath platform. Attendees will delve into the integration of generative AI, a test automation solution, with Open AI advanced natural language processing capabilities.
Throughout the session, participants will discover how this synergy empowers testers to automate repetitive tasks, enhance testing accuracy, and expedite the software testing life cycle. Topics covered include the seamless integration process, practical use cases, and the benefits of harnessing AI-driven automation for UiPath testing initiatives. By attending this webinar, testers, and automation professionals can gain valuable insights into harnessing the power of AI to optimize their test automation workflows within the UiPath ecosystem, ultimately driving efficiency and quality in software development processes.
What will you get from this session?
1. Insights into integrating generative AI.
2. Understanding how this integration enhances test automation within the UiPath platform
3. Practical demonstrations
4. Exploration of real-world use cases illustrating the benefits of AI-driven test automation for UiPath
Topics covered:
What is generative AI
Test Automation with generative AI and Open AI.
UiPath integration with generative AI
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Unlocking Productivity: Leveraging the Potential of Copilot in Microsoft 365, a presentation by Christoforos Vlachos, Senior Solutions Manager – Modern Workplace, Uni Systems
How to Get CNIC Information System with Paksim Ga.pptxdanishmna97
Pakdata Cf is a groundbreaking system designed to streamline and facilitate access to CNIC information. This innovative platform leverages advanced technology to provide users with efficient and secure access to their CNIC details.
Essentials of Automations: The Art of Triggers and Actions in FMESafe Software
In this second installment of our Essentials of Automations webinar series, we’ll explore the landscape of triggers and actions, guiding you through the nuances of authoring and adapting workspaces for seamless automations. Gain an understanding of the full spectrum of triggers and actions available in FME, empowering you to enhance your workspaces for efficient automation.
We’ll kick things off by showcasing the most commonly used event-based triggers, introducing you to various automation workflows like manual triggers, schedules, directory watchers, and more. Plus, see how these elements play out in real scenarios.
Whether you’re tweaking your current setup or building from the ground up, this session will arm you with the tools and insights needed to transform your FME usage into a powerhouse of productivity. Join us to discover effective strategies that simplify complex processes, enhancing your productivity and transforming your data management practices with FME. Let’s turn complexity into clarity and make your workspaces work wonders!
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Designing Cloud and Grid Computing Systems with InfiniBand and High-Speed Ethernet
1. Designing Cloud and Grid Computing Systems
with InfiniBand and High-Speed Ethernet
Dhabaleswar K. (DK) Panda
The Ohio State University
E-mail: panda@cse.ohio-state.edu
http://www.cse.ohio-state.edu/~panda
A Tutorial at CCGrid ’11
by
Sayantan Sur
The Ohio State University
E-mail: surs@cse.ohio-state.edu
http://www.cse.ohio-state.edu/~surs
2. • Introduction
• Why InfiniBand and High-speed Ethernet?
• Overview of IB, HSE, their Convergence and Features
• IB and HSE HW/SW Products and Installations
• Sample Case Studies and Performance Numbers
• Conclusions and Final Q&A
CCGrid '11
Presentation Overview
2
3. CCGrid '11
Current and Next Generation Applications and
Computing Systems
3
• Diverse Range of Applications
– Processing and dataset characteristics vary
• Growth of High Performance Computing
– Growth in processor performance
• Chip density doubles every 18 months
– Growth in commodity networking
• Increase in speed/features + reducing cost
• Different Kinds of Systems
– Clusters, Grid, Cloud, Datacenters, …..
4. CCGrid '11
Cluster Computing Environment
Compute cluster
LANFrontend
Meta-Data
Manager
I/O Server
Node
Meta
Data
Data
Compute
Node
Compute
Node
I/O Server
Node Data
Compute
Node
I/O Server
Node Data
Compute
Node
L
A
N
Storage cluster
LAN
4
6. CCGrid '11
Grid Computing Environment
6
Compute cluster
LAN
Frontend
Meta-Data
Manager
I/O Server
Node
Meta
Data
Data
Compute
Node
Compute
Node
I/O Server
Node Data
Compute
Node
I/O Server
Node Data
Compute
Node
L
A
N
Storage cluster
LAN
Compute cluster
LAN
Frontend
Meta-Data
Manager
I/O Server
Node
Meta
Data
Data
Compute
Node
Compute
Node
I/O Server
Node Data
Compute
Node
I/O Server
Node Data
Compute
Node
L
A
N
Storage cluster
LAN
Compute cluster
LAN
Frontend
Meta-Data
Manager
I/O Server
Node
Meta
Data
Data
Compute
Node
Compute
Node
I/O Server
Node Data
Compute
Node
I/O Server
Node Data
Compute
Node
L
A
N
Storage cluster
LAN
WAN
7. CCGrid '11
Multi-Tier Datacenters and Enterprise Computing
7
.
.
.
.
Enterprise Multi-tier Datacenter
Tier1 Tier3
Routers/
Servers
Switch
Database
Server
Application
Server
Routers/
Servers
Routers/
Servers
Application
Server
Application
Server
Application
Server
Database
Server
Database
Server
Database
Server
Switch Switch
Routers/
Servers
Tier2
8. CCGrid '11
Integrated High-End Computing Environments
Compute cluster
Meta-Data
Manager
I/O Server
Node
Meta
Data
Data
Compute
Node
Compute
Node
I/O Server
Node Data
Compute
Node
I/O Server
Node Data
Compute
Node
L
A
NLANFrontend
Storage cluster
LAN/WAN
.
.
.
.
Enterprise Multi-tier Datacenter for Visualization and Mining
Tier1 Tier3
Routers/
Servers
Switch
Database
Server
Application
Server
Routers/
Servers
Routers/
Servers
Application
Server
Application
Server
Application
Server
Database
Server
Database
Server
Database
Server
Switch Switch
Routers/
Servers
Tier2
8
9. CCGrid '11
Cloud Computing Environments
9
LAN
Physical Machine
VM VM
Physical Machine
VM VM
Physical Machine
VM VM
VirtualFS
Meta-Data
Meta
Data
I/O Server Data
I/O Server Data
I/O Server Data
I/O Server Data
Physical Machine
VM VM
Local Storage
Local Storage
Local Storage
Local Storage
10. Hadoop Architecture
• Underlying Hadoop Distributed
File System (HDFS)
• Fault-tolerance by replicating
data blocks
• NameNode: stores information
on data blocks
• DataNodes: store blocks and
host Map-reduce computation
• JobTracker: track jobs and
detect failure
• Model scales but high amount
of communication during
intermediate phases
10CCGrid '11
11. Memcached Architecture
• Distributed Caching Layer
– Allows to aggregate spare memory from multiple nodes
– General purpose
• Typically used to cache database queries, results of API calls
• Scalable model, but typical usage very network intensive
11
Internet
Web-frontend
Servers
Memcached
Servers
Database
Servers
System
Area
Network
System
Area
Network
CCGrid '11
12. • Good System Area Networks with excellent performance
(low latency, high bandwidth and low CPU utilization) for
inter-processor communication (IPC) and I/O
• Good Storage Area Networks high performance I/O
• Good WAN connectivity in addition to intra-cluster
SAN/LAN connectivity
• Quality of Service (QoS) for interactive applications
• RAS (Reliability, Availability, and Serviceability)
• With low cost
CCGrid '11
Networking and I/O Requirements
12
13. • Hardware components
– Processing cores and memory
subsystem
– I/O bus or links
– Network adapters/switches
• Software components
– Communication stack
• Bottlenecks can artificially
limit the network performance
the user perceives
CCGrid '11 13
Major Components in Computing Systems
P0
Core0 Core1
Core2 Core3
P1
Core0 Core1
Core2 Core3
Memory
MemoryI
/
O
B
u
s
Network Adapter
Network
Switch
Processing
Bottlenecks
I/O Interface
Bottlenecks
Network
Bottlenecks
14. • Ex: TCP/IP, UDP/IP
• Generic architecture for all networks
• Host processor handles almost all aspects
of communication
– Data buffering (copies on sender and
receiver)
– Data integrity (checksum)
– Routing aspects (IP routing)
• Signaling between different layers
– Hardware interrupt on packet arrival or
transmission
– Software signals between different layers
to handle protocol processing in different
priority levels
CCGrid '11
Processing Bottlenecks in Traditional Protocols
14
P0
Core0 Core1
Core2 Core3
P1
Core0 Core1
Core2 Core3
Memory
MemoryI
/
O
B
u
s
Network Adapter
Network
Switch
Processing
Bottlenecks
15. • Traditionally relied on bus-based
technologies (last mile bottleneck)
– E.g., PCI, PCI-X
– One bit per wire
– Performance increase through:
• Increasing clock speed
• Increasing bus width
– Not scalable:
• Cross talk between bits
• Skew between wires
• Signal integrity makes it difficult to increase bus
width significantly, especially for high clock speeds
CCGrid '11
Bottlenecks in Traditional I/O Interfaces and Networks
15
PCI 1990 33MHz/32bit: 1.05Gbps (shared bidirectional)
PCI-X
1998 (v1.0)
2003 (v2.0)
133MHz/64bit: 8.5Gbps (shared bidirectional)
266-533MHz/64bit: 17Gbps (shared bidirectional)
P0
Core0 Core1
Core2 Core3
P1
Core0 Core1
Core2 Core3
Memory
MemoryI
/
O
B
u
s
Network Adapter
Network
Switch
I/O Interface
Bottlenecks
16. • Network speeds saturated at
around 1Gbps
– Features provided were limited
– Commodity networks were not
considered scalable enough for very
large-scale systems
CCGrid '11 16
Bottlenecks on Traditional Networks
Ethernet (1979 - ) 10 Mbit/sec
Fast Ethernet (1993 -) 100 Mbit/sec
Gigabit Ethernet (1995 -) 1000 Mbit /sec
ATM (1995 -) 155/622/1024 Mbit/sec
Myrinet (1993 -) 1 Gbit/sec
Fibre Channel (1994 -) 1 Gbit/sec
P0
Core0 Core1
Core2 Core3
P1
Core0 Core1
Core2 Core3
Memory
MemoryI
/
O
B
u
s
Network Adapter
Network
Switch
Network
Bottlenecks
17. • Industry Networking Standards
• InfiniBand and High-speed Ethernet were introduced into
the market to address these bottlenecks
• InfiniBand aimed at all three bottlenecks (protocol
processing, I/O bus, and network speed)
• Ethernet aimed at directly handling the network speed
bottleneck and relying on complementary technologies to
alleviate the protocol processing and I/O bus bottlenecks
CCGrid '11 17
Motivation for InfiniBand and High-speed Ethernet
18. • Introduction
• Why InfiniBand and High-speed Ethernet?
• Overview of IB, HSE, their Convergence and Features
• IB and HSE HW/SW Products and Installations
• Sample Case Studies and Performance Numbers
• Conclusions and Final Q&A
CCGrid '11
Presentation Overview
18
19. • IB Trade Association was formed with seven industry leaders
(Compaq, Dell, HP, IBM, Intel, Microsoft, and Sun)
• Goal: To design a scalable and high performance communication
and I/O architecture by taking an integrated view of computing,
networking, and storage technologies
• Many other industry participated in the effort to define the IB
architecture specification
• IB Architecture (Volume 1, Version 1.0) was released to public
on Oct 24, 2000
– Latest version 1.2.1 released January 2008
• http://www.infinibandta.org
CCGrid '11
IB Trade Association
19
20. • 10GE Alliance formed by several industry leaders to take
the Ethernet family to the next speed step
• Goal: To achieve a scalable and high performance
communication architecture while maintaining backward
compatibility with Ethernet
• http://www.ethernetalliance.org
• 40-Gbps (Servers) and 100-Gbps Ethernet (Backbones,
Switches, Routers): IEEE 802.3 WG
• Energy-efficient and power-conscious protocols
– On-the-fly link speed reduction for under-utilized links
CCGrid '11
High-speed Ethernet Consortium (10GE/40GE/100GE)
20
21. • Network speed bottlenecks
• Protocol processing bottlenecks
• I/O interface bottlenecks
CCGrid '11 21
Tackling Communication Bottlenecks with IB and HSE
22. • Bit serial differential signaling
– Independent pairs of wires to transmit independent
data (called a lane)
– Scalable to any number of lanes
– Easy to increase clock speed of lanes (since each lane
consists only of a pair of wires)
• Theoretically, no perceived limit on the
bandwidth
CCGrid '11
Network Bottleneck Alleviation: InfiniBand (“Infinite
Bandwidth”) and High-speed Ethernet (10/40/100 GE)
22
24. 2005 - 2006 - 2007 - 2008 - 2009 - 2010 - 2011
Bandwidthperdirection(Gbps)
32G-IB-DDR
48G-IB-DDR
96G-IB-QDR
48G-IB-QDR
200G-IB-EDR
112G-IB-FDR
300G-IB-EDR
168G-IB-FDR
8x HDR
12x HDR
# of
Lanes per
direction
Per Lane & Rounded Per Link Bandwidth (Gb/s)
4G-IB
DDR
8G-IB
QDR
14G-IB-FDR
(14.025)
26G-IB-EDR
(25.78125)
12 48+48 96+96 168+168 300+300
8 32+32 64+64 112+112 200+200
4 16+16 32+32 56+56 100+100
1 4+4 8+8 14+14 25+25
NDR = Next Data Rate
HDR = High Data Rate
EDR = Enhanced Data Rate
FDR = Fourteen Data Rate
QDR = Quad Data Rate
DDR = Double Data Rate
SDR = Single Data Rate (not shown)
x12
x8
x1
2015
12x NDR
8x NDR
32G-IB-QDR
100G-IB-EDR
56G-IB-FDR
16G-IB-DDR
4x HDR
x4
4x NDR
8G-IB-QDR
25G-IB-EDR
14G-IB-FDR
1x HDR
1x NDR
InfiniBand Link Speed Standardization Roadmap
24CCGrid '11
25. • Network speed bottlenecks
• Protocol processing bottlenecks
• I/O interface bottlenecks
CCGrid '11 25
Tackling Communication Bottlenecks with IB and HSE
26. • Intelligent Network Interface Cards
• Support entire protocol processing completely in hardware
(hardware protocol offload engines)
• Provide a rich communication interface to applications
– User-level communication capability
– Gets rid of intermediate data buffering requirements
• No software signaling between communication layers
– All layers are implemented on a dedicated hardware unit, and not
on a shared host CPU
CCGrid '11
Capabilities of High-Performance Networks
26
27. • Fast Messages (FM)
– Developed by UIUC
• Myricom GM
– Proprietary protocol stack from Myricom
• These network stacks set the trend for high-performance
communication requirements
– Hardware offloaded protocol stack
– Support for fast and secure user-level access to the protocol stack
• Virtual Interface Architecture (VIA)
– Standardized by Intel, Compaq, Microsoft
– Precursor to IB
CCGrid '11
Previous High-Performance Network Stacks
27
28. • Some IB models have multiple hardware accelerators
– E.g., Mellanox IB adapters
• Protocol Offload Engines
– Completely implement ISO/OSI layers 2-4 (link layer, network layer
and transport layer) in hardware
• Additional hardware supported features also present
– RDMA, Multicast, QoS, Fault Tolerance, and many more
CCGrid '11
IB Hardware Acceleration
28
29. • Interrupt Coalescing
– Improves throughput, but degrades latency
• Jumbo Frames
– No latency impact; Incompatible with existing switches
• Hardware Checksum Engines
– Checksum performed in hardware significantly faster
– Shown to have minimal benefit independently
• Segmentation Offload Engines (a.k.a. Virtual MTU)
– Host processor “thinks” that the adapter supports large Jumbo
frames, but the adapter splits it into regular sized (1500-byte) frames
– Supported by most HSE products because of its backward
compatibility considered “regular” Ethernet
– Heavily used in the “server-on-steroids” model
• High performance servers connected to regular clients
CCGrid '11
Ethernet Hardware Acceleration
29
30. • TCP Offload Engines (TOE)
– Hardware Acceleration for the entire TCP/IP stack
– Initially patented by Tehuti Networks
– Actually refers to the IC on the network adapter that implements
TCP/IP
– In practice, usually referred to as the entire network adapter
• Internet Wide-Area RDMA Protocol (iWARP)
– Standardized by IETF and the RDMA Consortium
– Support acceleration features (like IB) for Ethernet
• http://www.ietf.org & http://www.rdmaconsortium.org
CCGrid '11
TOE and iWARP Accelerators
30
31. • Also known as “Datacenter Ethernet” or “Lossless Ethernet”
– Combines a number of optional Ethernet standards into one umbrella
as mandatory requirements
• Sample enhancements include:
– Priority-based flow-control: Link-level flow control for each Class of
Service (CoS)
– Enhanced Transmission Selection (ETS): Bandwidth assignment to
each CoS
– Datacenter Bridging Exchange Protocols (DBX): Congestion
notification, Priority classes
– End-to-end Congestion notification: Per flow congestion control to
supplement per link flow control
CCGrid '11
Converged (Enhanced) Ethernet (CEE or CE)
31
32. • Network speed bottlenecks
• Protocol processing bottlenecks
• I/O interface bottlenecks
CCGrid '11 32
Tackling Communication Bottlenecks with IB and HSE
33. • InfiniBand initially intended to replace I/O bus
technologies with networking-like technology
– That is, bit serial differential signaling
– With enhancements in I/O technologies that use a similar
architecture (HyperTransport, PCI Express), this has become
mostly irrelevant now
• Both IB and HSE today come as network adapters that plug
into existing I/O technologies
CCGrid '11
Interplay with I/O Technologies
33
34. • Recent trends in I/O interfaces show that they are nearly
matching head-to-head with network speeds (though they
still lag a little bit)
CCGrid '11
Trends in I/O Interfaces with Servers
PCI 1990 33MHz/32bit: 1.05Gbps (shared bidirectional)
PCI-X
1998 (v1.0)
2003 (v2.0)
133MHz/64bit: 8.5Gbps (shared bidirectional)
266-533MHz/64bit: 17Gbps (shared bidirectional)
AMD HyperTransport (HT)
2001 (v1.0), 2004 (v2.0)
2006 (v3.0), 2008 (v3.1)
102.4Gbps (v1.0), 179.2Gbps (v2.0)
332.8Gbps (v3.0), 409.6Gbps (v3.1)
(32 lanes)
PCI-Express (PCIe)
by Intel
2003 (Gen1), 2007 (Gen2)
2009 (Gen3 standard)
Gen1: 4X (8Gbps), 8X (16Gbps), 16X (32Gbps)
Gen2: 4X (16Gbps), 8X (32Gbps), 16X (64Gbps)
Gen3: 4X (~32Gbps), 8X (~64Gbps), 16X (~128Gbps)
Intel QuickPath
Interconnect (QPI)
2009 153.6-204.8Gbps (20 lanes)
34
35. • Introduction
• Why InfiniBand and High-speed Ethernet?
• Overview of IB, HSE, their Convergence and Features
• IB and HSE HW/SW Products and Installations
• Sample Case Studies and Performance Numbers
• Conclusions and Final Q&A
CCGrid '11
Presentation Overview
35
36. • InfiniBand
– Architecture and Basic Hardware Components
– Communication Model and Semantics
– Novel Features
– Subnet Management and Services
• High-speed Ethernet Family
– Internet Wide Area RDMA Protocol (iWARP)
– Alternate vendor-specific protocol stacks
• InfiniBand/Ethernet Convergence Technologies
– Virtual Protocol Interconnect (VPI)
– (InfiniBand) RDMA over Ethernet (RoE)
– (InfiniBand) RDMA over Converged (Enhanced) Ethernet (RoCE)
CCGrid '11
IB, HSE and their Convergence
36
37. CCGrid '11 37
Comparing InfiniBand with Traditional Networking Stack
Application Layer
MPI, PGAS, File Systems
Transport Layer
OpenFabrics Verbs
RC (reliable), UD (unreliable)
Link Layer
Flow-control, Error Detection
Physical Layer
InfiniBand
Copper or Optical
HTTP, FTP, MPI,
File Systems
Routing
Physical Layer
Link Layer
Network Layer
Transport Layer
Application Layer
Traditional Ethernet
Sockets Interface
TCP, UDP
Flow-control and
Error Detection
Copper, Optical or Wireless
Network Layer Routing
OpenSM (management tool)
DNS management tools
38. • InfiniBand
– Architecture and Basic Hardware Components
– Communication Model and Semantics
• Communication Model
• Memory registration and protection
• Channel and memory semantics
– Novel Features
• Hardware Protocol Offload
– Link, network and transport layer features
– Subnet Management and Services
CCGrid '11
IB Overview
38
39. • Used by processing and I/O units to
connect to fabric
• Consume & generate IB packets
• Programmable DMA engines with
protection features
• May have multiple ports
– Independent buffering channeled
through Virtual Lanes
• Host Channel Adapters (HCAs)
CCGrid '11
Components: Channel Adapters
39
C
Port
VL
VL
VL
…
Port
VL
VL
VL
…
Port
VL
VL
VL
…
…
DMA
Memory
QP
QP
QP
QP
…
MTP
SMA
Transport
Channel Adapter
40. • Relay packets from a link to another
• Switches: intra-subnet
• Routers: inter-subnet
• May support multicast
CCGrid '11
Components: Switches and Routers
40
Packet Relay
Port
VL
VL
VL
…
Port
VL
VL
VL
…
Port
VL
VL
VL
…
…
Switch
GRH Packet Relay
Port
VL
VL
VL
…
Port
VL
VL
VL
…
Port
VL
VL
VL
…
…
Router
41. • Network Links
– Copper, Optical, Printed Circuit wiring on Back Plane
– Not directly addressable
• Traditional adapters built for copper cabling
– Restricted by cable length (signal integrity)
– For example, QDR copper cables are restricted to 7m
• Intel Connects: Optical cables with Copper-to-optical
conversion hubs (acquired by Emcore)
– Up to 100m length
– 550 picoseconds
copper-to-optical conversion latency
• Available from other vendors (Luxtera)
• Repeaters (Vol. 2 of InfiniBand specification)
CCGrid '11
Components: Links & Repeaters
(Courtesy Intel)
41
42. • InfiniBand
– Architecture and Basic Hardware Components
– Communication Model and Semantics
• Communication Model
• Memory registration and protection
• Channel and memory semantics
– Novel Features
• Hardware Protocol Offload
– Link, network and transport layer features
– Subnet Management and Services
CCGrid '11
IB Overview
42
44. • Each QP has two queues
– Send Queue (SQ)
– Receive Queue (RQ)
– Work requests are queued to the QP
(WQEs: “Wookies”)
• QP to be linked to a Complete Queue
(CQ)
– Gives notification of operation
completion from QPs
– Completed WQEs are placed in the
CQ with additional information
(CQEs: “Cookies”)
CCGrid '11
Queue Pair Model
InfiniBand Device
CQQP
Send Recv
WQEs CQEs
44
45. 1. Registration Request
• Send virtual address and length
2. Kernel handles virtual->physical
mapping and pins region into
physical memory
• Process cannot map memory
that it does not own (security !)
3. HCA caches the virtual to physical
mapping and issues a handle
• Includes an l_key and r_key
4. Handle is returned to application
CCGrid '11
Memory Registration
Before we do any communication:
All memory used for communication must
be registered
1
3
4
Process
Kernel
HCA/RNIC
2
45
46. • To send or receive data the l_key
must be provided to the HCA
• HCA verifies access to local
memory
• For RDMA, initiator must have the
r_key for the remote virtual address
• Possibly exchanged with a
send/recv
• r_key is not encrypted in IB
CCGrid '11
Memory Protection
HCA/NIC
Kernel
Process
l_key
r_key is needed for RDMA operations
For security, keys are required for all
operations that touch buffers
46
47. CCGrid '11
Communication in the Channel Semantics
(Send/Receive Model)
InfiniBand Device
Memory Memory
InfiniBand Device
CQ
QP
Send Recv
Memory
Segment
Send WQE contains information about the
send buffer (multiple non-contiguous
segments)
Processor Processor
CQ
QP
Send Recv
Memory
Segment
Receive WQE contains information on the receive
buffer (multiple non-contiguous segments);
Incoming messages have to be matched to a
receive WQE to know where to place
Hardware ACK
Memory
Segment
Memory
Segment
Memory
Segment
Processor is involved only to:
1. Post receive WQE
2. Post send WQE
3. Pull out completed CQEs from the CQ
47
48. CCGrid '11
Communication in the Memory Semantics (RDMA Model)
InfiniBand Device
Memory Memory
InfiniBand Device
CQ
QP
Send Recv
Memory
Segment
Send WQE contains information about the
send buffer (multiple segments) and the
receive buffer (single segment)
Processor Processor
CQ
QP
Send Recv
Memory
Segment
Hardware ACK
Memory
Segment
Memory
Segment
Initiator processor is involved only to:
1. Post send WQE
2. Pull out completed CQE from the send CQ
No involvement from the target processor
48
49. InfiniBand Device
CCGrid '11
Communication in the Memory Semantics (Atomics)
Memory Memory
InfiniBand Device
CQ
QP
Send Recv
Memory
Segment
Send WQE contains information about the
send buffer (single 64-bit segment) and the
receive buffer (single 64-bit segment)
Processor Processor
CQ
QP
Send Recv
49
Source
Memory
Segment
OP
Destination
Memory
Segment
Initiator processor is involved only to:
1. Post send WQE
2. Pull out completed CQE from the send CQ
No involvement from the target processor
IB supports compare-and-swap and
fetch-and-add atomic operations
50. • InfiniBand
– Architecture and Basic Hardware Components
– Communication Model and Semantics
• Communication Model
• Memory registration and protection
• Channel and memory semantics
– Novel Features
• Hardware Protocol Offload
– Link, network and transport layer features
– Subnet Management and Services
CCGrid '11
IB Overview
50
52. • Buffering and Flow Control
• Virtual Lanes, Service Levels and QoS
• Switching and Multicast
CCGrid '11
Link/Network Layer Capabilities
52
53. • IB provides three-levels of communication throttling/control
mechanisms
– Link-level flow control (link layer feature)
– Message-level flow control (transport layer feature): discussed later
– Congestion control (part of the link layer features)
• IB provides an absolute credit-based flow-control
– Receiver guarantees that enough space is allotted for N blocks of data
– Occasional update of available credits by the receiver
• Has no relation to the number of messages, but only to the
total amount of data being sent
– One 1MB message is equivalent to 1024 1KB messages (except for
rounding off at message boundaries)
CCGrid '11
Buffering and Flow Control
53
54. • Multiple virtual links within
same physical link
– Between 2 and 16
• Separate buffers and flow
control
– Avoids Head-of-Line
Blocking
• VL15: reserved for
management
• Each port supports one or
more data VL
CCGrid '11
Virtual Lanes
54
55. • Service Level (SL):
– Packets may operate at one of 16 different SLs
– Meaning not defined by IB
• SL to VL mapping:
– SL determines which VL on the next link is to be used
– Each port (switches, routers, end nodes) has a SL to VL mapping
table configured by the subnet management
• Partitions:
– Fabric administration (through Subnet Manager) may assign
specific SLs to different partitions to isolate traffic flows
CCGrid '11
Service Levels and QoS
55
56. • InfiniBand Virtual Lanes
allow the multiplexing of
multiple independent logical
traffic flows on the same
physical link
• Providing the benefits of
independent, separate
networks while eliminating
the cost and difficulties
associated with maintaining
two or more networks
CCGrid '11
Traffic Segregation Benefits
(Courtesy: Mellanox Technologies)
Routers, Switches
VPN’s, DSLAMs
Storage Area Network
RAID, NAS, Backup
IPC, Load Balancing, Web Caches, ASP
InfiniBand
Network
Virtual Lanes
Servers
Fabric
ServersServers
IP Network
InfiniBand
Fabric
56
57. • Each port has one or more associated LIDs (Local
Identifiers)
– Switches look up which port to forward a packet to based on its
destination LID (DLID)
– This information is maintained at the switch
• For multicast packets, the switch needs to maintain
multiple output ports to forward the packet to
– Packet is replicated to each appropriate output port
– Ensures at-most once delivery & loop-free forwarding
– There is an interface for a group management protocol
• Create, join/leave, prune, delete group
CCGrid '11
Switching (Layer-2 Routing) and Multicast
57
58. • Basic unit of switching is a crossbar
– Current InfiniBand products use either 24-port (DDR) or 36-port
(QDR) crossbars
• Switches available in the market are typically collections of
crossbars within a single cabinet
• Do not confuse “non-blocking switches” with “crossbars”
– Crossbars provide all-to-all connectivity to all connected nodes
• For any random node pair selection, all communication is non-blocking
– Non-blocking switches provide a fat-tree of many crossbars
• For any random node pair selection, there exists a switch
configuration such that communication is non-blocking
• If the communication pattern changes, the same switch configuration
might no longer provide fully non-blocking communication
CCGrid '11 58
Switch Complex
59. • Someone has to setup the forwarding tables and
give every port an LID
– “Subnet Manager” does this work
• Different routing algorithms give different paths
CCGrid '11
IB Switching/Routing: An Example
Leaf Blocks
Spine Blocks
P1P2 DLID Out-Port
2 1
4 4
Forwarding TableLID: 2
LID: 4
1 2 3 4
59
An Example IB Switch Block Diagram (Mellanox 144-Port) Switching: IB supports
Virtual Cut Through (VCT)
Routing: Unspecified by IB SPEC
Up*/Down*, Shift are popular
routing engines supported by OFED
• Fat-Tree is a popular
topology for IB Cluster
– Different over-subscription
ratio may be used
• Other topologies are also
being used
– 3D Torus (Sandia Red Sky)
and SGI Altix (Hypercube)
60. • Similar to basic switching, except…
– … sender can utilize multiple LIDs associated to the same
destination port
• Packets sent to one DLID take a fixed path
• Different packets can be sent using different DLIDs
• Each DLID can have a different path (switch can be configured
differently for each DLID)
• Can cause out-of-order arrival of packets
– IB uses a simplistic approach:
• If packets in one connection arrive out-of-order, they are dropped
– Easier to use different DLIDs for different connections
• This is what most high-level libraries using IB do!
CCGrid '11 60
More on Multipathing
63. • Each transport service can have zero or more QPs
associated with it
– E.g., you can have four QPs based on RC and one QP based on UD
CCGrid '11
IB Transport Services
Service Type
Connection
Oriented
Acknowledged Transport
Reliable Connection Yes Yes IBA
Unreliable Connection Yes No IBA
Reliable Datagram No Yes IBA
Unreliable Datagram No No IBA
RAW Datagram No No Raw
63
64. CCGrid '11
Trade-offs in Different Transport Types
64
Attribute Reliable
Connection
Reliable
Datagram
eXtended
Reliable
Connection
Unreliable
Connection
Unreliable
Datagram
Raw
Datagram
Scalability
(M processes, N
nodes)
M2N QPs
per HCA
M QPs
per HCA
MNQPs
per HCA
M2N QPs
per HCA
M QPs
per HCA
1 QP
per HCA
Reliability
Corrupt
data
detected
Yes
Data
Delivery
Guarantee
Data delivered exactly once No guarantees
Data Order
Guarantees Per connection
One source to
multiple
destinations
Per connection
Unordered,
duplicate data
detected
No No
Data Loss
Detected
Yes No No
Error
Recovery
Errors (retransmissions, alternate path, etc.)
handled by transport layer. Client only involved in
handling fatal errors (links broken, protection
violation, etc.)
Packets with
errors and
sequence
errors are
reported to
responder
None None
65. • Data Segmentation
• Transaction Ordering
• Message-level Flow Control
• Static Rate Control and Auto-negotiation
CCGrid '11
Transport Layer Capabilities
65
66. • IB transport layer provides a message-level communication
granularity, not byte-level (unlike TCP)
• Application can hand over a large message
– Network adapter segments it to MTU sized packets
– Single notification when the entire message is transmitted or
received (not per packet)
• Reduced host overhead to send/receive messages
– Depends on the number of messages, not the number of bytes
CCGrid '11 66
Data Segmentation
67. • IB follows a strong transaction ordering for RC
• Sender network adapter transmits messages in the order
in which WQEs were posted
• Each QP utilizes a single LID
– All WQEs posted on same QP take the same path
– All packets are received by the receiver in the same order
– All receive WQEs are completed in the order in which they were
posted
CCGrid '11
Transaction Ordering
67
68. • Also called as End-to-end Flow-control
– Does not depend on the number of network hops
• Separate from Link-level Flow-Control
– Link-level flow-control only relies on the number of bytes being
transmitted, not the number of messages
– Message-level flow-control only relies on the number of messages
transferred, not the number of bytes
• If 5 receive WQEs are posted, the sender can send 5
messages (can post 5 send WQEs)
– If the sent messages are larger than what the receive buffers are
posted, flow-control cannot handle it
CCGrid '11 68
Message-level Flow-Control
69. • IB allows link rates to be statically changed
– On a 4X link, we can set data to be sent at 1X
– For heterogeneous links, rate can be set to the lowest link rate
– Useful for low-priority traffic
• Auto-negotiation also available
– E.g., if you connect a 4X adapter to a 1X switch, data is
automatically sent at 1X rate
• Only fixed settings available
– Cannot set rate requirement to 3.16 Gbps, for example
CCGrid '11 69
Static Rate Control and Auto-Negotiation
70. • InfiniBand
– Architecture and Basic Hardware Components
– Communication Model and Semantics
• Communication Model
• Memory registration and protection
• Channel and memory semantics
– Novel Features
• Hardware Protocol Offload
– Link, network and transport layer features
– Subnet Management and Services
CCGrid '11
IB Overview
70
71. • Agents
– Processes or hardware units running on each adapter, switch,
router (everything on the network)
– Provide capability to query and set parameters
• Managers
– Make high-level decisions and implement it on the network fabric
using the agents
• Messaging schemes
– Used for interactions between the manager and agents (or
between agents)
• Messages
CCGrid '11
Concepts in IB Management
71
73. • InfiniBand
– Architecture and Basic Hardware Components
– Communication Model and Semantics
– Novel Features
– Subnet Management and Services
• High-speed Ethernet Family
– Internet Wide Area RDMA Protocol (iWARP)
– Alternate vendor-specific protocol stacks
• InfiniBand/Ethernet Convergence Technologies
– Virtual Protocol Interconnect (VPI)
– (InfiniBand) RDMA over Ethernet (RoE)
– (InfiniBand) RDMA over Converged (Enhanced) Ethernet (RoCE)
CCGrid '11
IB, HSE and their Convergence
73
74. • High-speed Ethernet Family
– Internet Wide-Area RDMA Protocol (iWARP)
• Architecture and Components
• Features
– Out-of-order data placement
– Dynamic and Fine-grained Data Rate control
– Multipathing using VLANs
• Existing Implementations of HSE/iWARP
– Alternate Vendor-specific Stacks
• MX over Ethernet (for Myricom 10GE adapters)
• Datagram Bypass Layer (for Myricom 10GE adapters)
• Solarflare OpenOnload (for Solarflare 10GE adapters)
CCGrid '11
HSE Overview
74
75. CCGrid '11
IB and HSE RDMA Models: Commonalities and
Differences
IB iWARP/HSE
Hardware Acceleration Supported Supported
RDMA Supported Supported
Atomic Operations Supported Not supported
Multicast Supported Supported
Congestion Control Supported Supported
Data Placement Ordered Out-of-order
Data Rate-control Static and Coarse-grained Dynamic and Fine-grained
QoS Prioritization
Prioritization and
Fixed Bandwidth QoS
Multipathing Using DLIDs Using VLANs
75
76. • RDMA Protocol (RDMAP)
– Feature-rich interface
– Security Management
• Remote Direct Data Placement (RDDP)
– Data Placement and Delivery
– Multi Stream Semantics
– Connection Management
• Marker PDU Aligned (MPA)
– Middle Box Fragmentation
– Data Integrity (CRC)
CCGrid '11
iWARP Architecture and Components
RDDP
Application or Library
Hardware
User
SCTP
IP
Device Driver
Network Adapter
(e.g., 10GigE)
iWARP Offload Engines
TCP
RDMAP
MPA
(Courtesy iWARP Specification)
76
77. • High-speed Ethernet Family
– Internet Wide-Area RDMA Protocol (iWARP)
• Architecture and Components
• Features
– Out-of-order data placement
– Dynamic and Fine-grained Data Rate control
• Existing Implementations of HSE/iWARP
– Alternate Vendor-specific Stacks
• MX over Ethernet (for Myricom 10GE adapters)
• Datagram Bypass Layer (for Myricom 10GE adapters)
• Solarflare OpenOnload (for Solarflare 10GE adapters)
CCGrid '11
HSE Overview
77
78. • Place data as it arrives, whether in or out-of-order
• If data is out-of-order, place it at the appropriate offset
• Issues from the application’s perspective:
– Second half of the message has been placed does not mean that
the first half of the message has arrived as well
– If one message has been placed, it does not mean that that the
previous messages have been placed
• Issues from protocol stack’s perspective
– The receiver network stack has to understand each frame of data
• If the frame is unchanged during transmission, this is easy!
– The MPA protocol layer adds appropriate information at regular
intervals to allow the receiver to identify fragmented frames
CCGrid '11
Decoupled Data Placement and Data Delivery
78
79. • Part of the Ethernet standard, not iWARP
– Network vendors use a separate interface to support it
• Dynamic bandwidth allocation to flows based on interval
between two packets in a flow
– E.g., one stall for every packet sent on a 10 Gbps network refers to
a bandwidth allocation of 5 Gbps
– Complicated because of TCP windowing behavior
• Important for high-latency/high-bandwidth networks
– Large windows exposed on the receiver side
– Receiver overflow controlled through rate control
CCGrid '11
Dynamic and Fine-grained Rate Control
79
80. • Can allow for simple prioritization:
– E.g., connection 1 performs better than connection 2
– 8 classes provided (a connection can be in any class)
• Similar to SLs in InfiniBand
– Two priority classes for high-priority traffic
• E.g., management traffic or your favorite application
• Or can allow for specific bandwidth requests:
– E.g., can request for 3.62 Gbps bandwidth
– Packet pacing and stalls used to achieve this
• Query functionality to find out “remaining bandwidth”
CCGrid '11
Prioritization and Fixed Bandwidth QoS
80
81. • High-speed Ethernet Family
– Internet Wide-Area RDMA Protocol (iWARP)
• Architecture and Components
• Features
– Out-of-order data placement
– Dynamic and Fine-grained Data Rate control
• Existing Implementations of HSE/iWARP
– Alternate Vendor-specific Stacks
• MX over Ethernet (for Myricom 10GE adapters)
• Datagram Bypass Layer (for Myricom 10GE adapters)
• Solarflare OpenOnload (for Solarflare 10GE adapters)
CCGrid '11
HSE Overview
81
82. • Regular Ethernet adapters and
TOEs are fully compatible
• Compatibility with iWARP
required
• Software iWARP emulates the
functionality of iWARP on the
host
– Fully compatible with hardware
iWARP
– Internally utilizes host TCP/IP stack
CCGrid '11
Software iWARP based Compatibility
82
Wide
Area
Network
Distributed Cluster
Environment
Regular Ethernet
Cluster
iWARP
Cluster
TOE
Cluster
83. CCGrid '11
Different iWARP Implementations
Regular Ethernet Adapters
Application
High Performance Sockets
Sockets
Network Adapter
TCP
IP
Device Driver
Offloaded TCP
Offloaded IP
Software
iWARP
TCP Offload Engines
Application
High Performance Sockets
Sockets
Network Adapter
TCP
IP
Device Driver
Offloaded TCP
Offloaded IP
Offloaded iWARP
iWARP compliant
Adapters
Application
Kernel-level
iWARP
TCP (Modified with MPA)
IP
Device Driver
Network Adapter
Sockets
Application
User-level iWARP
IP
Sockets
TCP
Device Driver
Network Adapter
OSU, OSC, IBM OSU, ANL Chelsio, NetEffect (Intel)
83
84. • High-speed Ethernet Family
– Internet Wide-Area RDMA Protocol (iWARP)
• Architecture and Components
• Features
– Out-of-order data placement
– Dynamic and Fine-grained Data Rate control
– Multipathing using VLANs
• Existing Implementations of HSE/iWARP
– Alternate Vendor-specific Stacks
• MX over Ethernet (for Myricom 10GE adapters)
• Datagram Bypass Layer (for Myricom 10GE adapters)
• Solarflare OpenOnload (for Solarflare 10GE adapters)
CCGrid '11
HSE Overview
84
85. • Proprietary communication layer developed by Myricom for
their Myrinet adapters
– Third generation communication layer (after FM and GM)
– Supports Myrinet-2000 and the newer Myri-10G adapters
• Low-level “MPI-like” messaging layer
– Almost one-to-one match with MPI semantics (including connection-
less model, implicit memory registration and tag matching)
– Later versions added some more advanced communication methods
such as RDMA to support other programming models such as ARMCI
(low-level runtime for the Global Arrays PGAS library)
• Open-MX
– New open-source implementation of the MX interface for non-
Myrinet adapters from INRIA, France
CCGrid '11 85
Myrinet Express (MX)
86. • Another proprietary communication layer developed by
Myricom
– Compatible with regular UDP sockets (embraces and extends)
– Idea is to bypass the kernel stack and give UDP applications direct
access to the network adapter
• High performance and low-jitter
• Primary motivation: Financial market applications (e.g.,
stock market)
– Applications prefer unreliable communication
– Timeliness is more important than reliability
• This stack is covered by NDA; more details can be
requested from Myricom
CCGrid '11 86
Datagram Bypass Layer (DBL)
87. CCGrid '11
Solarflare Communications: OpenOnload Stack
87
Typical HPC Networking StackTypical Commodity Networking Stack
• HPC Networking Stack provides many
performance benefits, but has limitations
for certain types of scenarios, especially
where applications tend to fork(), exec()
and need asynchronous advancement (per
application)
Solarflare approach to networking stack
• Solarflare approach:
Network hardware provides user-safe
interface to route packets directly to apps
based on flow information in headers
Protocol processing can happen in both
kernel and user space
Protocol state shared between app and
kernel using shared memory
Courtesy Solarflare communications (www.openonload.org/openonload-google-talk.pdf)
88. • InfiniBand
– Architecture and Basic Hardware Components
– Communication Model and Semantics
– Novel Features
– Subnet Management and Services
• High-speed Ethernet Family
– Internet Wide Area RDMA Protocol (iWARP)
– Alternate vendor-specific protocol stacks
• InfiniBand/Ethernet Convergence Technologies
– Virtual Protocol Interconnect (VPI)
– (InfiniBand) RDMA over Ethernet (RoE)
– (InfiniBand) RDMA over Converged (Enhanced) Ethernet (RoCE)
CCGrid '11
IB, HSE and their Convergence
88
89. • Single network firmware to support
both IB and Ethernet
• Autosensing of layer-2 protocol
– Can be configured to automatically
work with either IB or Ethernet
networks
• Multi-port adapters can use one
port on IB and another on Ethernet
• Multiple use modes:
– Datacenters with IB inside the
cluster and Ethernet outside
– Clusters with IB network and
Ethernet management
CCGrid '11
Virtual Protocol Interconnect (VPI)
IB Link Layer
IB Port Ethernet Port
Hardware
TCP/IP
support
Ethernet
Link Layer
IB Network
Layer
IP
IB Transport
Layer
TCP
IB Verbs Sockets
Applications
89
90. • Native convergence of IB network and transport
layers with Ethernet link layer
• IB packets encapsulated in Ethernet frames
• IB network layer already uses IPv6 frames
• Pros:
– Works natively in Ethernet environments (entire
Ethernet management ecosystem is available)
– Has all the benefits of IB verbs
• Cons:
– Network bandwidth might be limited to Ethernet
switches: 10GE switches available; 40GE yet to
arrive; 32 Gbps IB available
– Some IB native link-layer features are optional in
(regular) Ethernet
• Approved by OFA board to be included into OFED
CCGrid '11
(InfiniBand) RDMA over Ethernet (IBoE or RoE)
Ethernet
IB Network
IB Transport
IB Verbs
Application
Hardware
90
91. • Very similar to IB over Ethernet
– Often used interchangeably with IBoE
– Can be used to explicitly specify link layer is
Converged (Enhanced) Ethernet (CE)
• Pros:
– Works natively in Ethernet environments (entire
Ethernet management ecosystem is available)
– Has all the benefits of IB verbs
– CE is very similar to the link layer of native IB, so
there are no missing features
• Cons:
– Network bandwidth might be limited to Ethernet
switches: 10GE switches available; 40GE yet to
arrive; 32 Gbps IB available
CCGrid '11
(InfiniBand) RDMA over Converged (Enhanced) Ethernet
(RoCE)
CE
IB Network
IB Transport
IB Verbs
Application
Hardware
91
93. • Introduction
• Why InfiniBand and High-speed Ethernet?
• Overview of IB, HSE, their Convergence and Features
• IB and HSE HW/SW Products and Installations
• Sample Case Studies and Performance Numbers
• Conclusions and Final Q&A
CCGrid '11
Presentation Overview
93
94. • Many IB vendors: Mellanox+Voltaire and Qlogic
– Aligned with many server vendors: Intel, IBM, SUN, Dell
– And many integrators: Appro, Advanced Clustering, Microway
• Broadly two kinds of adapters
– Offloading (Mellanox) and Onloading (Qlogic)
• Adapters with different interfaces:
– Dual port 4X with PCI-X (64 bit/133 MHz), PCIe x8, PCIe 2.0 and HT
• MemFree Adapter
– No memory on HCA Uses System memory (through PCIe)
– Good for LOM designs (Tyan S2935, Supermicro 6015T-INFB)
• Different speeds
– SDR (8 Gbps), DDR (16 Gbps) and QDR (32 Gbps)
• Some 12X SDR adapters exist as well (24 Gbps each way)
• New ConnectX-2 adapter from Mellanox supports offload for
collectives (Barrier, Broadcast, etc.)
CCGrid '11
IB Hardware Products
94
95. CCGrid '11
Tyan Thunder S2935 Board
(Courtesy Tyan)
Similar boards from Supermicro with LOM features are also available
95
96. • Customized adapters to work with IB switches
– Cray XD1 (formerly by Octigabay), Cray CX1
• Switches:
– 4X SDR and DDR (8-288 ports); 12X SDR (small sizes)
– 3456-port “Magnum” switch from SUN used at TACC
• 72-port “nano magnum”
– 36-port Mellanox InfiniScale IV QDR switch silicon in 2008
• Up to 648-port QDR switch by Mellanox and SUN
• Some internal ports are 96 Gbps (12X QDR)
– IB switch silicon from Qlogic introduced at SC ’08
• Up to 846-port QDR switch by Qlogic
– New FDR (56 Gbps) switch silicon (Bridge-X) has been announced by
Mellanox in May ‘11
• Switch Routers with Gateways
– IB-to-FC; IB-to-IP
CCGrid '11
IB Hardware Products (contd.)
96
97. • 10GE adapters: Intel, Myricom, Mellanox (ConnectX)
• 10GE/iWARP adapters: Chelsio, NetEffect (now owned by Intel)
• 40GE adapters: Mellanox ConnectX2-EN 40G
• 10GE switches
– Fulcrum Microsystems
• Low latency switch based on 24-port silicon
• FM4000 switch with IP routing, and TCP/UDP support
– Fujitsu, Myricom (512 ports), Force10, Cisco, Arista (formerly Arastra)
• 40GE and 100GE switches
– Nortel Networks
• 10GE downlinks with 40GE and 100GE uplinks
– Broadcom has announced 40GE switch in early 2010
CCGrid '11
10G, 40G and 100G Ethernet Products
97
98. • Mellanox ConnectX Adapter
• Supports IB and HSE convergence
• Ports can be configured to support IB or HSE
• Support for VPI and RoCE
– 8 Gbps (SDR), 16Gbps (DDR) and 32Gbps (QDR) rates available
for IB
– 10GE rate available for RoCE
– 40GE rate for RoCE is expected to be available in near future
CCGrid '11
Products Providing IB and HSE Convergence
98
99. • Open source organization (formerly OpenIB)
– www.openfabrics.org
• Incorporates both IB and iWARP in a unified manner
– Support for Linux and Windows
– Design of complete stack with `best of breed’ components
• Gen1
• Gen2 (current focus)
• Users can download the entire stack and run
– Latest release is OFED 1.5.3
– OFED 1.6 is underway
CCGrid '11
Software Convergence with OpenFabrics
99
100. CCGrid '11
OpenFabrics Stack with Unified Verbs Interface
Verbs Interface
(libibverbs)
Mellanox
(libmthca)
QLogic
(libipathverbs)
IBM
(libehca)
Chelsio
(libcxgb3)
Mellanox
(ib_mthca)
QLogic
(ib_ipath)
IBM
(ib_ehca)
Chelsio
(ib_cxgb3)
User Level
Kernel Level
Mellanox
Adapters
QLogic
Adapters
Chelsio
Adapters
IBM
Adapters
100
101. • For IBoE and RoCE, the upper-level
stacks remain completely
unchanged
• Within the hardware:
– Transport and network layers remain
completely unchanged
– Both IB and Ethernet (or CEE) link
layers are supported on the network
adapter
• Note: The OpenFabrics stack is not
valid for the Ethernet path in VPI
– That still uses sockets and TCP/IP
CCGrid '11
OpenFabrics on Convergent IB/HSE
Verbs Interface
(libibverbs)
ConnectX
(libmlx4)
ConnectX
(ib_mlx4)
User Level
Kernel Level
ConnectX
Adapters
HSE IB
101
102. CCGrid '11
OpenFabrics Software Stack
SA Subnet Administrator
MAD Management Datagram
SMA Subnet Manager Agent
PMA Performance Manager
Agent
IPoIB IP over InfiniBand
SDP Sockets Direct Protocol
SRP SCSI RDMA Protocol
(Initiator)
iSER iSCSI RDMA Protocol
(Initiator)
RDS Reliable Datagram Service
UDAPL User Direct Access
Programming Lib
HCA Host Channel Adapter
R-NIC RDMA NIC
Common
InfiniBand
iWARP
Key
InfiniBand HCA iWARP R-NIC
Hardware
Specific Driver
Hardware Specific
Driver
Connection
Manager
MAD
InfiniBand OpenFabrics Kernel Level Verbs / API iWARP R-NIC
SA
Client
Connection
Manager
Connection Manager
Abstraction (CMA)
InfiniBand OpenFabrics User Level Verbs / API iWARP R-NIC
SDPIPoIB SRP iSER RDS
SDP Lib
User Level
MAD API
Open
SM
Diag
Tools
Hardware
Provider
Mid-Layer
Upper
Layer
Protocol
User
APIs
Kernel Space
User Space
NFS-RDMA
RPC
Cluster
File Sys
Application
Level
SMA
Clustered
DB Access
Sockets
Based
Access
Various
MPIs
Access to
File
Systems
Block
Storage
Access
IP Based
App
Access
Apps &
Access
Methods
for using
OF Stack
UDAPL
Kernelbypass
Kernelbypass
102
104. 45%
43%
6%
1%
0% 0% 1% 0% 0%0%
4%
Number of Systems
Gigabit Ethernet InfiniBand Proprietary
Myrinet Quadrics Mixed
NUMAlink SP Switch Cray Interconnect
Fat Tree Custom
CCGrid '11 104
InfiniBand in the Top500 (Nov. 2010)
25%
44%
21%
1%0%
0% 0%
0%
0%
0%
9%
Performance
Gigabit Ethernet Infiniband Proprietary
Myrinet Quadrics Mixed
NUMAlink SP Switch Cray Interconnect
Fat Tree Custom
105. 105
InfiniBand System Efficiency in the Top500 List
CCGrid '11
0
10
20
30
40
50
60
70
80
90
100
0 50 100 150 200 250 300 350 400 450 500
Efficiency(%)
Top 500 Systems
Computer Cluster Efficiency Comparison
IB-CPU IB-GPU/Cell GigE 10GigE IBM-BlueGene Cray
106. • 214 IB Clusters (42.8%) in the Nov ‘10 Top500 list (http://www.top500.org)
• Installations in the Top 30 (13 systems):
CCGrid '11
Large-scale InfiniBand Installations
120,640 cores (Nebulae) in China (3rd) 15,120 cores (Loewe) in Germany (22nd)
73,278 cores (Tsubame-2.0) at Japan (4th) 26,304 cores (Juropa) in Germany (23rd)
138,368 cores (Tera-100) at France (6th) 26,232 cores (Tachyonll) in South Korea (24th)
122,400 cores (RoadRunner) at LANL (7th) 23,040 cores (Jade) at GENCI (27th)
81,920 cores (Pleiades) at NASA Ames (11th) 33,120 cores (Mole-8.5) in China (28th)
42,440 cores (Red Sky) at Sandia (14th) More are getting installed !
62,976 cores (Ranger) at TACC (15th)
35,360 cores (Lomonosov) in Russia (17th)
106
107. • HSE compute systems with ranking in the Nov 2010 Top500 list
– 8,856-core installation in Purdue with ConnectX-EN 10GigE (#126)
– 7,944-core installation in Purdue with 10GigE Chelsio/iWARP (#147)
– 6,828-core installation in Germany (#166)
– 6,144-core installation in Germany (#214)
– 6,144-core installation in Germany (#215)
– 7,040-core installation at the Amazon EC2 Cluster (#231)
– 4,000-core installation at HHMI (#349)
• Other small clusters
– 640-core installation in University of Heidelberg, Germany
– 512-core installation at Sandia National Laboratory (SNL) with
Chelsio/iWARP and Woven Systems switch
– 256-core installation at Argonne National Lab with Myri-10G
• Integrated Systems
– BG/P uses 10GE for I/O (ranks 9, 13, 16, and 34 in the Top 50)
CCGrid '11 107
HSE Scientific Computing Installations
108. • HSE has most of its popularity in enterprise computing and
other non-scientific markets including Wide-area
networking
• Example Enterprise Computing Domains
– Enterprise Datacenters (HP, Intel)
– Animation firms (e.g., Universal Studios (“The Hulk”), 20th Century
Fox (“Avatar”), and many new movies using 10GE)
– Amazon’s HPC cloud offering uses 10GE internally
– Heavily used in financial markets (users are typically undisclosed)
• Many Network-attached Storage devices come integrated
with 10GE network adapters
• ESnet to install $62M 100GE infrastructure for US DOE
CCGrid '11 108
Other HSE Installations
109. • Introduction
• Why InfiniBand and High-speed Ethernet?
• Overview of IB, HSE, their Convergence and Features
• IB and HSE HW/SW Products and Installations
• Sample Case Studies and Performance Numbers
• Conclusions and Final Q&A
CCGrid '11
Presentation Overview
109
111. • Low-level Network Performance
• Clusters with Message Passing Interface (MPI)
• Datacenters with Sockets Direct Protocol (SDP) and TCP/IP
(IPoIB)
• InfiniBand in WAN and Grid-FTP
• Cloud Computing: Hadoop and Memcached
CCGrid '11 111
Case Studies
112. CCGrid '11 112
Low-level Latency Measurements
0
5
10
15
20
25
30
VPI-IB
Native IB
VPI-Eth
RoCE
Small Messages
Latency(us)
Message Size (bytes)
0
1000
2000
3000
4000
5000
6000
7000
8000
9000
10000
Large Messages
Latency(us)
Message Size (bytes)
ConnectX-DDR: 2.4 GHz Quad-core (Nehalem) Intel with IB and 10GE switches
RoCE has a slight overhead compared to native IB because it operates at a slower clock rate (required to support
only a 10Gbps link for Ethernet, as compared to a 32Gbps link for IB)
114. • Low-level Network Performance
• Clusters with Message Passing Interface (MPI)
• Datacenters with Sockets Direct Protocol (SDP) and TCP/IP
(IPoIB)
• InfiniBand in WAN and Grid-FTP
• Cloud Computing: Hadoop and Memcached
CCGrid '11 114
Case Studies
115. • High Performance MPI Library for IB and HSE
– MVAPICH (MPI-1) and MVAPICH2 (MPI-2.2)
– Used by more than 1,550 organizations in 60 countries
– More than 66,000 downloads from OSU site directly
– Empowering many TOP500 clusters
• 11th ranked 81,920-core cluster (Pleiades) at NASA
• 15th ranked 62,976-core cluster (Ranger) at TACC
– Available with software stacks of many IB, HSE and server vendors
including Open Fabrics Enterprise Distribution (OFED) and Linux
Distros (RedHat and SuSE)
– http://mvapich.cse.ohio-state.edu
CCGrid '11 115
MVAPICH/MVAPICH2 Software
116. CCGrid '11 116
One-way Latency: MPI over IB
0
1
2
3
4
5
6
Small Message Latency
Message Size (bytes)
Latency(us)
1.96
1.54
1.60
2.17
0
50
100
150
200
250
300
350
400
MVAPICH-Qlogic-DDR
MVAPICH-Qlogic-QDR-PCIe2
MVAPICH-ConnectX-DDR
MVAPICH-ConnectX-QDR-PCIe2
Latency(us)
Message Size (bytes)
Large Message Latency
All numbers taken on 2.4 GHz Quad-core (Nehalem) Intel with IB switch
122. • Low-level Network Performance
• Clusters with Message Passing Interface (MPI)
• Datacenters with Sockets Direct Protocol (SDP) and
TCP/IP (IPoIB)
• InfiniBand in WAN and Grid-FTP
• Cloud Computing: Hadoop and Memcached
CCGrid '11 122
Case Studies
123. CCGrid '11 123
IPoIB vs. SDP Architectural Models
Traditional Model Possible SDP Model
Sockets App
Sockets API
Sockets Application
Sockets API
Kernel
TCP/IP Sockets
Provider
TCP/IP Transport
Driver
IPoIB Driver
User
InfiniBand CA
Kernel
TCP/IP Sockets
Provider
TCP/IP Transport
Driver
IPoIB Driver
User
InfiniBand CA
Sockets Direct
Protocol
Kernel
Bypass
RDMA
Semantics
SDP
OS Modules
InfiniBand
Hardware
(Source: InfiniBand Trade Association)
125. • Low-level Network Performance
• Clusters with Message Passing Interface (MPI)
• Datacenters with Sockets Direct Protocol (SDP) and TCP/IP
(IPoIB)
• InfiniBand in WAN and Grid-FTP
• Cloud Computing: Hadoop and Memcached
CCGrid '11 125
Case Studies
126. • Option 1: Layer-1 Optical networks
– IB standard specifies link, network and transport layers
– Can use any layer-1 (though the standard says copper and optical)
• Option 2: Link Layer Conversion Techniques
– InfiniBand-to-Ethernet conversion at the link layer: switches
available from multiple companies (e.g., Obsidian)
• Technically, it’s not conversion; it’s just tunneling (L2TP)
– InfiniBand’s network layer is IPv6 compliant
CCGrid '11 126
IB on the WAN
127. Features
• End-to-end guaranteed bandwidth channels
• Dynamic, in-advance, reservation and provisioning of fractional/full lambdas
• Secure control-plane for signaling
• Peering with ESnet, National Science Foundation CHEETAH, and other networks
CCGrid '11 127
UltraScience Net: Experimental Research Network Testbed
Since 2005
This and the following IB WAN slides are courtesy Dr. Nagi Rao (ORNL)
128. • Supports SONET OC-192 or
10GE LAN-PHY/WAN-PHY
• Idea is to make remote storage
“appear” local
• IB-WAN switch does frame
conversion
– IB standard allows per-hop credit-
based flow control
– IB-WAN switch uses large internal
buffers to allow enough credits to
fill the wire
CCGrid '11 128
IB-WAN Connectivity with Obsidian Switches
CPU CPU
System
Controller
System
Memory
HCA
Host bus
Storage
Wide-area
SONET/10GE
IB switch IB WAN
129. CCGrid '11 129
InfiniBand Over SONET: Obsidian Longbows RDMA
throughput measurements over USN
Linux
host
ORNL
700 miles
Linux
host
Chicago
CDCI
Seattle
CDCI
Sunnyvale
CDCI
ORNL
CDCI
longbow
IB/S
longbow
IB/S
3300 miles 4300 miles
ORNL loop -0.2 mile: 7.48Gbps
IB 4x: 8Gbps (full speed)
Host-to-host local switch:7.5Gbps
IB 4x
ORNL-Chicago loop – 1400 miles: 7.47Gbps
ORNL- Chicago - Seattle loop – 6600 miles: 7.37Gbps
ORNL – Chicago – Seattle - Sunnyvale loop – 8600 miles: 7.34Gbps
OC192
Quad-core
Dual socket
Hosts:
Dual-socket Quad-core 2 GHz AMD
Opteron, 4GB memory, 8-lane PCI-
Express slot, Dual-port Voltaire 4x
SDR HCA
130. CCGrid '11 130
IB over 10GE LAN-PHY and WAN-PHY
Linux
host
ORNL
700 miles
Linux
host
Seattle
CDCI
ORNL
CDCI
longbow
IB/S
longbow
IB/S
3300 miles 4300 miles
ORNL loop -0.2 mile: 7.5Gbps
IB 4x
ORNL-Chicago loop – 1400 miles: 7.49Gbps
ORNL- Chicago - Seattle loop – 6600 miles: 7.39Gbps
ORNL – Chicago – Seattle - Sunnyvale loop – 8600 miles: 7.36Gbps
OC192
Quad-core
Dual socket
Chicago
CDCI
Chicago
E300
Sunnyvale
E300
Sunnyvale
CDCI
OC 192
10GE WAN PHY
131. MPI over IB-WAN: Obsidian Routers
Delay (us) Distance
(km)
10 2
100 20
1000 200
10000 2000
Cluster A Cluster BWAN Link
Obsidian WAN Router Obsidian WAN Router
(Variable Delay) (Variable Delay)
MPI Bidirectional Bandwidth Impact of Encryption on Message Rate (Delay 0 ms)
Hardware encryption has no impact on performance for less communicating streams
131CCGrid '11
S. Narravula, H. Subramoni, P. Lai, R. Noronha and D. K. Panda, Performance of HPC Middleware over
InfiniBand WAN, Int'l Conference on Parallel Processing (ICPP '08), September 2008.
132. Communication Options in Grid
• Multiple options exist to perform data transfer on Grid
• Globus-XIO framework currently does not support IB natively
• We create the Globus-XIO ADTS driver and add native IB support
to GridFTP
Globus XIO Framework
GridFTP High Performance Computing Applications
10 GigE Network
IB Verbs IPoIB RoCE TCP/IP
Obsidian
Routers
132
CCGrid '11
133. Globus-XIO Framework with ADTS Driver
Globus XIO Driver #n
Data
Connection
Management
Persistent
Session
Management
Buffer &
File
Management
Data Transport Interface
InfiniBand / RoCE 10GigE/iWARP
Globus XIO
Interface
File System
User
Globus-XIO
ADTS Driver
Modern WAN
Interconnects Network
Flow ControlZero Copy Channel
Memory Registration
Globus XIO Driver #1
133CCGrid '11
134. 134
Performance of Memory Based
Data Transfer
• Performance numbers obtained while transferring 128 GB of
aggregate data in chunks of 256 MB files
• ADTS based implementation is able to saturate the link
bandwidth
• Best performance for ADTS obtained when performing data
transfer with a network buffer of size 32 MB
0
200
400
600
800
1000
1200
0 10 100 1000
Bandwidth(MBps)
Network Delay (us)
ADTS Driver
2 MB 8 MB 32 MB 64 MB
0
200
400
600
800
1000
1200
0 10 100 1000
Bandwidth(MBps)
Network Delay (us)
UDT Driver
2 MB 8 MB 32 MB 64 MB
CCGrid '11
135. 135
Performance of Disk Based Data Transfer
• Performance numbers obtained while transferring 128 GB of
aggregate data in chunks of 256 MB files
• Predictable as well as better performance when Disk-IO
threads assist network thread (Asynchronous Mode)
• Best performance for ADTS obtained with a circular buffer
with individual buffers of size 64 MB
0
100
200
300
400
0 10 100 1000
Bandwidth(MBps)
Network Delay (us)
Synchronous Mode
ADTS-8MB ADTS-16MB ADTS-32MB
ADTS-64MB IPoIB-64MB
0
100
200
300
400
0 10 100 1000
Bandwidth(MBps)
Network Delay (us)
Asynchronous Mode
ADTS-8MB ADTS-16MB ADTS-32MB
ADTS-64MB IPoIB-64MB
CCGrid '11
136. 136
Application Level Performance
0
50
100
150
200
250
300
CCSM Ultra-Viz
Bandwidth(MBps)
Target Applications
ADTS IPoIB
• Application performance for FTP get
operation for disk based transfers
• Community Climate System Model
(CCSM)
• Part of Earth System Grid Project
• Transfers 160 TB of total data in
chunks of 256 MB
• Network latency - 30 ms
• Ultra-Scale Visualization (Ultra-Viz)
• Transfers files of size 2.6 GB
• Network latency - 80 ms
The ADTS driver out performs the UDT driver using IPoIB by more than
100%
H. Subramoni, P. Lai, R. Kettimuthu and D. K. Panda, High Performance Data Transfer in
Grid Environment Using GridFTP over InfiniBand, Int'l Symposium on Cluster Computing and
the Grid (CCGrid), May 2010.
136CCGrid '11
137. • Low-level Network Performance
• Clusters with Message Passing Interface (MPI)
• Datacenters with Sockets Direct Protocol (SDP) and TCP/IP
(IPoIB)
• InfiniBand in WAN and Grid-FTP
• Cloud Computing: Hadoop and Memcached
CCGrid '11 137
Case Studies
138. A New Approach towards OFA in Cloud
Current Approach
Towards OFA in Cloud
Application
Accelerated Sockets
10 GigE or InfiniBand
Verbs / Hardware
Offload
Current Cloud
Software Design
Application
Sockets
1/10 GigE
Network
Our Approach
Towards OFA in Cloud
Application
Verbs Interface
10 GigE or InfiniBand
• Sockets not designed for high-performance
– Stream semantics often mismatch for upper layers (Memcached, Hadoop)
– Zero-copy not available for non-blocking sockets (Memcached)
• Significant consolidation in cloud system software
– Hadoop and Memcached are developer facing APIs, not sockets
– Improving Hadoop and Memcached will benefit many applications immediately!
CCGrid '11 138
139. Memcached Design Using Verbs
• Server and client perform a negotiation protocol
– Master thread assigns clients to appropriate worker thread
• Once a client is assigned a verbs worker thread, it can communicate directly
and is “bound” to that thread
• All other Memcached data structures are shared among RDMA and Sockets
worker threads
Sockets
Client
RDMA
Client
Master
Thread
Sockets
Worker
Thread
Verbs
Worker
Thread
Sockets
Worker
Thread
Verbs
Worker
Thread
Shared
Data
Memory
Slabs
Items
…
1
1
2
2
CCGrid '11 139
140. Memcached Get Latency
• Memcached Get latency
– 4 bytes – DDR: 6 us; QDR: 5 us
– 4K bytes -- DDR: 20 us; QDR:12 us
• Almost factor of four improvement over 10GE (TOE) for 4K
• We are in the process of evaluating iWARP on 10GE
Intel Clovertown Cluster (IB: DDR) Intel Westmere Cluster (IB: QDR)
0
50
100
150
200
250
1 2 4 8 16 32 64 128 256 512 1K 2K 4K
Time(us)
Message Size
0
50
100
150
200
250
300
350
1 2 4 8 16 32 64 128256512 1K 2K 4K
Time(us)
Message Size
SDP
IPoIB
OSU Design
1G
10G-TOE
CCGrid '11 140
141. Memcached Get TPS
• Memcached Get transactions per second for 4 bytes
– On IB DDR about 600K/s for 16 clients
– On IB QDR 1.9M/s for 16 clients
• Almost factor of six improvement over 10GE (TOE)
• We are in the process of evaluating iWARP on 10GE
Intel Clovertown Cluster (IB: DDR) Intel Westmere Cluster (IB: QDR)
0
200
400
600
800
1000
1200
1400
1600
1800
2000
8 Clients 16 Clients
SDP
IPoIB
OSU Design
0
100
200
300
400
500
600
700
8 Clients 16 Clients
ThousandsofTransactionspersecond
(TPS)
SDP
10G - TOE
OSU Design
CCGrid '11 141
142. Hadoop: Java Communication Benchmark
• Sockets level ping-pong bandwidth test
• Java performance depends on usage of NIO (allocateDirect)
• C and Java versions of the benchmark have similar performance
• HDFS does not use direct allocated blocks or NIO on DataNode
S. Sur, H. Wang, J. Huang, X. Ouyang and D. K. Panda “Can High-Performance
Interconnects Benefit Hadoop Distributed File System?”, MASVDC „10 in conjunction with
MICRO 2010, Atlanta, GA.
0
200
400
600
800
1000
1200
1400
Bandwidth(MB/s)
Message Size (Bytes)
Bandwidth with C version
0
200
400
600
800
1000
1200
1400
1 4 16 64 256 1K 4K 16K 64K 256K 1M 4M
Bandwidth(MB/s)
Message Size (Bytes)
Bandwidth Java Version
1GE
IPoIB
SDP
10GE-TOE
SDP-NIO
CCGrid '11 142
143. Hadoop: DFS IO Write Performance
• DFS IO included in Hadoop, measures sequential access throughput
• We have two map tasks each writing to a file of increasing size (1-10GB)
• Significant improvement with IPoIB, SDP and 10GigE
• With SSD, performance improvement is almost seven or eight fold!
• SSD benefits not seen without using high-performance interconnect!
• In-line with comment on Google keynote about I/O performance
Four Data Nodes
Using HDD and SSD
0
10
20
30
40
50
60
70
80
90
1 2 3 4 5 6 7 8 9 10
AverageWriteThroughput(MB/sec)
File Size(GB)
1GE with HDD
IGE with SSD
IPoIB with HDD
IPoIB with SSD
SDP with HDD
SDP with SSD
10GE-TOE with HDD
10GE-TOE with SSD
CCGrid '11 143
144. Hadoop: RandomWriter Performance
• Each map generates 1GB of random binary data and writes to HDFS
• SSD improves execution time by 50% with 1GigE for two DataNodes
• For four DataNodes, benefits are observed only with HPC interconnect
• IPoIB, SDP and 10GigE can improve performance by 59% on four DataNodes
0
100
200
300
400
500
600
700
2 4
ExecutionTime(sec)
Number of data nodes
1GE with HDD
IGE with SSD
IPoIB with HDD
IPoIB with SSD
SDP with HDD
SDP with SSD
10GE-TOE with HDD
10GE-TOE with SSD
CCGrid '11 144
145. Hadoop Sort Benchmark
• Sort: baseline benchmark for Hadoop
• Sort phase: I/O bound; Reduce phase: communication bound
• SSD improves performance by 28% using 1GigE with two DataNodes
• Benefit of 50% on four DataNodes using SDP, IPoIB or 10GigE
0
500
1000
1500
2000
2500
2 4
ExecutionTime(sec)
Number of data nodes
1GE with HDD
IGE with SSD
IPoIB with HDD
IPoIB with SSD
SDP with HDD
SDP with SSD
10GE-TOE with HDD
10GE-TOE with SSD
CCGrid '11 145
146. • Introduction
• Why InfiniBand and High-speed Ethernet?
• Overview of IB, HSE, their Convergence and Features
• IB and HSE HW/SW Products and Installations
• Sample Case Studies and Performance Numbers
• Conclusions and Final Q&A
CCGrid '11
Presentation Overview
146
147. • Presented network architectures & trends for Clusters,
Grid, Multi-tier Datacenters and Cloud Computing Systems
• Presented background and details of IB and HSE
– Highlighted the main features of IB and HSE and their convergence
– Gave an overview of IB and HSE hardware/software products
– Discussed sample performance numbers in designing various high-
end systems with IB and HSE
• IB and HSE are emerging as new architectures leading to a
new generation of networked computing systems, opening
many research issues needing novel solutions
CCGrid '11
Concluding Remarks
147
149. CCGrid '11
Personnel Acknowledgments
Current Students
– N. Dandapanthula (M.S.)
– R. Darbha (M.S.)
– V. Dhanraj (M.S.)
– J. Huang (Ph.D.)
– J. Jose (Ph.D.)
– K. Kandalla (Ph.D.)
– M. Luo (Ph.D.)
– V. Meshram (M.S.)
– X. Ouyang (Ph.D.)
– S. Pai (M.S.)
– S. Potluri (Ph.D.)
– R. Rajachandrasekhar (Ph.D.)
– A. Singh (Ph.D.)
– H. Subramoni (Ph.D.)
Past Students
– P. Balaji (Ph.D.)
– D. Buntinas (Ph.D.)
– S. Bhagvat (M.S.)
– L. Chai (Ph.D.)
– B. Chandrasekharan (M.S.)
– T. Gangadharappa (M.S.)
– K. Gopalakrishnan (M.S.)
– W. Huang (Ph.D.)
– W. Jiang (M.S.)
– S. Kini (M.S.)
– M. Koop (Ph.D.)
– R. Kumar (M.S.)
– S. Krishnamoorthy (M.S.)
– P. Lai (Ph. D.)
– J. Liu (Ph.D.)
– A. Mamidala (Ph.D.)
– G. Marsh (M.S.)
– S. Naravula (Ph.D.)
– R. Noronha (Ph.D.)
– G. Santhanaraman (Ph.D.)
– J. Sridhar (M.S.)
– S. Sur (Ph.D.)
– K. Vaidyanathan (Ph.D.)
– A. Vishnu (Ph.D.)
– J. Wu (Ph.D.)
– W. Yu (Ph.D.)
149
Current Research Scientist
– S. Sur
Current Post-Docs
– H. Wang
– J. Vienne
– X. Besseron
Current Programmers
– M. Arnold
– D. Bureddy
– J. Perkins
Past Post-Docs
– E. Mancini
– S. Marcarelli
– H.-W. Jin