The document discusses network-aware data management for large-scale distributed applications. It provides an outline for a presentation on this topic, including discussing the performance of VSAN and VVOL storage in virtualized environments, the PetaShare distributed storage system and Stork data scheduler, data streaming in high-bandwidth networks, and several other related topics like network reservations and scheduling. The presenter's background and experience working on data transfer scheduling, distributed storage, and high-performance computing networks is also briefly summarized.
Network-aware Data Management for High Throughput Flows Akamai, Cambridge, ...balmanme
The document discusses Mehmet Balman's work on network-aware data management for large-scale distributed applications. It provides background on Balman, including his employment at VMware and affiliations. The presentation outline discusses VSAN and VVOL storage performance in virtualized environments, data streaming in high-bandwidth networks, the Climate100 100Gbps networking demo, and other topics related to network-aware data management.
Linac Coherent Light Source (LCLS) Data Transfer Requirementsinside-BigData.com
In this deck from the Stanford HPC Conference, Les Cottrell from the SLAC National Accelerator Laboratory, at Stanford University presents: Linac Coherent Light Source (LCLS) Data Transfer Requirements.
"Funded by the U.S. Department of Energy (DOE) the LCLS is the world’s first hard X-ray free-electron laser. Its strobe-like pulses are just a few millionths of a billionth of a second long, and a billion times brighter than previous X-ray sources. Scientists use LCLS to take crisp pictures of atomic motions, watch chemical reactions unfold, probe the properties of materials and explore fundamental processes in living things.
Its performance to date, over the first few years of operation, has already provided a breathtaking array of world-leading results, published in the most prestigious academic journals and has inspired other XFEL facilities to be commissioned around the world.
LCLS-II will build from the success of LCLS to ensure that the U.S. maintains a world-leading capability for advanced research in chemistry, materials, biology and energy. It is planned to see first light in 2020.
LCLS-II will provide a major jump in capability – moving from 120 pulses per second to 1 million pulses per second. This will enable researchers to perform experiments in a wide range of fields that are now impossible. The unique capabilities of LCLS-II will yield a host of discoveries to advance technology, new energy solutions and our quality of life.
Analysis of the data will require transporting huge amounts of data from SLAC to supercomputers at other sites to provide near real-time analysis results and feedback to the experiments.
The talk will introduce LCLS and LCLS-II with a short video, discuss its data reduction, collection, data transfer needs and current progress in meeting these needs."
Watch the video: https://youtu.be/LkwwGh7YdPI
Learn more: https://www6.slac.stanford.edu/
and
http://hpcadvisorycouncil.com
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
Streaming exa-scale data over 100Gbps networksbalmanme
This document discusses streaming exascale data over 100Gbps networks. It summarizes a demonstration at SC11 where climate simulation data was transferred from NERSC to ANL and ORNL at 83Gbps using a memory-mapped zero-copy network channel called MemzNet. The demonstration showed efficient transfer of large datasets containing many small files is possible over high-bandwidth networks through parallel streams, decoupling I/O and network operations, and dynamic data channel management. High-performance was achieved by keeping the data channel full through concurrent transfers and leveraging high-speed networking testbeds like ANI.
Open Programmable Architecture for Java-enabled Network DevicesTal Lavian Ph.D.
Programmable Network Devices.
Openly Programmable devices enable new types of intelligence on the network.
Changing the Rules of the Game.
The Web Changed Everything
-Browsers:Introducing JVM to browsers allowed dynamic loading of Java Applets to end stations
-Routers:Introducing JVM to routers allows dynamic loading of Java Oplets to routers
A Platform for Data Intensive Services Enabled by Next Generation Dynamic Opt...Tal Lavian Ph.D.
The new architecture is proposed for data intensive enabled by next generation dynamic optical networks
Encapsulates “optical network resources” into a service framework to support dynamically provisioned and advanced data-intensive transport services
Provides a generalized framework for high performance applications over next generation networks, not necessary optical end-to-end
Supports both on-demand and scheduled data retrieval
Supports a meshed wavelength switched network capable of establishing an end-to-end lightpath in seconds
Supports bulk data-transfer facilities using lambda-switched networks
Supports out-of-band tools for adaptive placement of data replicas
Offers network resources as Grid services for Grid computing
The document summarizes the architecture and configuration of a large-scale data warehouse implemented at Yahoo using Oracle RAC on IBM x3850 servers. Key aspects included 16-node Oracle RAC with InfiniBand networking, EMC storage, large memory and CPU configurations to support multi-terabyte datasets and high query concurrency. Comprehensive testing was performed to validate performance and scalability requirements.
The document discusses two proprietary technologies from TIMMES, Inc. called ODEN and MAGNUS that improve data transmission efficiency over networks. ODEN utilizes compression and TCP acceleration to improve bandwidth efficiency. MAGNUS is an evolution of ODEN that requires software on only one side and uses TCP acceleration to reduce latency effects. Third party tests showed the technologies increased throughput by 200-2000% over FTP and HTTP for various file types including video and imagery files.
The document discusses how application architects traditionally focused on solving IO bottlenecks in servers by offloading processing to intelligent network interface cards. With modern distributed applications spanning thousands of servers, application architects now must consider network topology, segmentation, and control plane protocols to optimize latency and bandwidth. The rise of virtualization and cloud computing has changed traffic patterns in datacenters from north-south traffic to dominant east-west traffic between servers. This requires new datacenter fabric designs beyond the traditional three-tiered topology.
Network-aware Data Management for High Throughput Flows Akamai, Cambridge, ...balmanme
The document discusses Mehmet Balman's work on network-aware data management for large-scale distributed applications. It provides background on Balman, including his employment at VMware and affiliations. The presentation outline discusses VSAN and VVOL storage performance in virtualized environments, data streaming in high-bandwidth networks, the Climate100 100Gbps networking demo, and other topics related to network-aware data management.
Linac Coherent Light Source (LCLS) Data Transfer Requirementsinside-BigData.com
In this deck from the Stanford HPC Conference, Les Cottrell from the SLAC National Accelerator Laboratory, at Stanford University presents: Linac Coherent Light Source (LCLS) Data Transfer Requirements.
"Funded by the U.S. Department of Energy (DOE) the LCLS is the world’s first hard X-ray free-electron laser. Its strobe-like pulses are just a few millionths of a billionth of a second long, and a billion times brighter than previous X-ray sources. Scientists use LCLS to take crisp pictures of atomic motions, watch chemical reactions unfold, probe the properties of materials and explore fundamental processes in living things.
Its performance to date, over the first few years of operation, has already provided a breathtaking array of world-leading results, published in the most prestigious academic journals and has inspired other XFEL facilities to be commissioned around the world.
LCLS-II will build from the success of LCLS to ensure that the U.S. maintains a world-leading capability for advanced research in chemistry, materials, biology and energy. It is planned to see first light in 2020.
LCLS-II will provide a major jump in capability – moving from 120 pulses per second to 1 million pulses per second. This will enable researchers to perform experiments in a wide range of fields that are now impossible. The unique capabilities of LCLS-II will yield a host of discoveries to advance technology, new energy solutions and our quality of life.
Analysis of the data will require transporting huge amounts of data from SLAC to supercomputers at other sites to provide near real-time analysis results and feedback to the experiments.
The talk will introduce LCLS and LCLS-II with a short video, discuss its data reduction, collection, data transfer needs and current progress in meeting these needs."
Watch the video: https://youtu.be/LkwwGh7YdPI
Learn more: https://www6.slac.stanford.edu/
and
http://hpcadvisorycouncil.com
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
Streaming exa-scale data over 100Gbps networksbalmanme
This document discusses streaming exascale data over 100Gbps networks. It summarizes a demonstration at SC11 where climate simulation data was transferred from NERSC to ANL and ORNL at 83Gbps using a memory-mapped zero-copy network channel called MemzNet. The demonstration showed efficient transfer of large datasets containing many small files is possible over high-bandwidth networks through parallel streams, decoupling I/O and network operations, and dynamic data channel management. High-performance was achieved by keeping the data channel full through concurrent transfers and leveraging high-speed networking testbeds like ANI.
Open Programmable Architecture for Java-enabled Network DevicesTal Lavian Ph.D.
Programmable Network Devices.
Openly Programmable devices enable new types of intelligence on the network.
Changing the Rules of the Game.
The Web Changed Everything
-Browsers:Introducing JVM to browsers allowed dynamic loading of Java Applets to end stations
-Routers:Introducing JVM to routers allows dynamic loading of Java Oplets to routers
A Platform for Data Intensive Services Enabled by Next Generation Dynamic Opt...Tal Lavian Ph.D.
The new architecture is proposed for data intensive enabled by next generation dynamic optical networks
Encapsulates “optical network resources” into a service framework to support dynamically provisioned and advanced data-intensive transport services
Provides a generalized framework for high performance applications over next generation networks, not necessary optical end-to-end
Supports both on-demand and scheduled data retrieval
Supports a meshed wavelength switched network capable of establishing an end-to-end lightpath in seconds
Supports bulk data-transfer facilities using lambda-switched networks
Supports out-of-band tools for adaptive placement of data replicas
Offers network resources as Grid services for Grid computing
The document summarizes the architecture and configuration of a large-scale data warehouse implemented at Yahoo using Oracle RAC on IBM x3850 servers. Key aspects included 16-node Oracle RAC with InfiniBand networking, EMC storage, large memory and CPU configurations to support multi-terabyte datasets and high query concurrency. Comprehensive testing was performed to validate performance and scalability requirements.
The document discusses two proprietary technologies from TIMMES, Inc. called ODEN and MAGNUS that improve data transmission efficiency over networks. ODEN utilizes compression and TCP acceleration to improve bandwidth efficiency. MAGNUS is an evolution of ODEN that requires software on only one side and uses TCP acceleration to reduce latency effects. Third party tests showed the technologies increased throughput by 200-2000% over FTP and HTTP for various file types including video and imagery files.
The document discusses how application architects traditionally focused on solving IO bottlenecks in servers by offloading processing to intelligent network interface cards. With modern distributed applications spanning thousands of servers, application architects now must consider network topology, segmentation, and control plane protocols to optimize latency and bandwidth. The rise of virtualization and cloud computing has changed traffic patterns in datacenters from north-south traffic to dominant east-west traffic between servers. This requires new datacenter fabric designs beyond the traditional three-tiered topology.
Improving Passive Packet Capture : Beyond Device PollingHargyo T. Nugroho
The document discusses improving passive packet capture performance beyond device polling. It proposes a "Socket Ring" approach using PF_RING to create a ring buffer on the network interface card driver. This allows captured packets to bypass the kernel and be directly accessed by userspace applications via memory mapping, improving performance over traditional approaches. Experimental results found the PF_RING approach captured packets much faster than Linux's standard approach, especially for medium and large packets, though some packets were still lost. The approach requires a real-time kernel patch and performance is ultimately limited by network drivers and how the kernel fetches packets.
This technical whitepaper compares Aspera FASP, a high-speed transport protocol, to alternative TCP-based and UDP-based file transfer technologies. It finds that while TCP and high-speed TCP variants can improve throughput over standard TCP in low-loss networks, their performance degrades significantly in wide-area networks with higher latency and packet loss. UDP-based solutions also struggle to achieve high throughput and efficiency across different network conditions due to poor congestion control. In contrast, Aspera FASP is able to achieve maximum throughput that is independent of network characteristics like latency and packet loss, making it optimal for reliable, high-speed transfer of large files over IP networks.
DWDM-RAM:Enabling Grid Services with Dynamic Optical NetworksTal Lavian Ph.D.
Packet-switching technology
Great solution for small-burst communication, such as email, telnet, etc.
Data-intensive grid applications
Involves moving massive amounts of data
Requires high and sustained bandwidth
DWDM
Basically circuit switching
Enable QoS at the Physical Layer
Provide
High bandwidth
Sustained bandwidth
DWDM based on dynamic wavelength switching
Enable dedicated optical paths to be allocated dynamically
A Whole Lot of Ports: Juniper Networks QFabric System AssessmentJuniper Networks
Juniper Networks commissioned Network Test to assess the performance, interoperability, and usability of its QFabric System, a converged switch fabric for cloud and large data center applications tested with 1,536 10-Gbit/s Ethernet ports.
Even at this unprecedented scale – by far the largest ever in a public switch test – this project loaded the QFabric System to only one-quarter of its maximum capacity of 6,144 10-Gbit/s Ethernet ports.
Using industry-standard RFC benchmarks representing the most rigorous possible test cases, engineers stress-tested QFabric System performance in terms of unicast and multicast throughput and latency with separate events for Layer 2 and Layer 3 traffic. Engineers also assessed interoperability, a key consideration when adding QFabric technology incrementally into existing data center networks, and evaluated device management.
Improving Efficiency of Machine Learning Algorithms using HPCC SystemsHPCC Systems
1) The document discusses improving the efficiency of machine learning algorithms using the HPCC Systems platform through parallelization.
2) It describes the HPCC Systems architecture and its advantages for distributed machine learning.
3) A parallel DBSCAN algorithm is implemented on the HPCC platform which shows improved performance over the serial algorithm, with execution times decreasing as more nodes are used.
This document summarizes the DevoFlow paper, which proposes techniques to scale flow management for high-performance networks. It finds that per-flow management in OpenFlow introduces high overheads. DevoFlow aims to balance network control, statistics collection, and switch overhead by devolving most flow control to switches while maintaining partial visibility of significant flows. Simulation results show DevoFlow can reduce flow scheduling overheads compared to per-flow control, while still achieving high performance.
Grid optical network service architecture for data intensive applicationsTal Lavian Ph.D.
Integrated SW System Provide the “Glue”
Dynamic optical network as a fundamental Grid service in data-intensive Grid application, to be scheduled, to be managed and coordinated to support collaborative operations
From Super-computer to Super-network
In the past, computer processors were the fastest part
peripheral bottlenecks
In the future optical networks will be the fastest part
Computer, processor, storage, visualization, and instrumentation - slower "peripherals”
eScience Cyber-infrastructure focuses on computation, storage, data, analysis, Work Flow.
The network is vital for better eScience
Virtualization in 4-4 1-4 Data Center Network.Ankita Mahajan
4-4 1-4 delivers great performance guarantees in traditional (non-virtualized) setting, due to location based static IP address allocation to all network elements.
Download this ppt first and then open in powerpoint to view without merged figures and with animations.
The document provides an overview of several data center network architectures: Monsoon, VL2, SEATTLE, PortLand, and TRILL. Monsoon proposes a large layer 2 domain with a Clos topology and uses MAC-in-MAC encapsulation and load balancing to improve scalability. VL2 also uses a Clos topology with flat addressing, load balancing, and an end host directory for address resolution. SEATTLE employs flat addressing, automated host discovery, and hash-based address resolution. PortLand uses a tree topology with encoded switch positions and a fabric manager for address mapping. TRILL standardizes encapsulation and IS-IS routing between routing bridges.
VL2: A scalable and flexible Data Center NetworkAnkita Mahajan
This Data Center Network Architecture introduces a virtual layer 2.5 in the protocol stack of hosts and uses a directory service to achieve efficient forwarding. It uses separate location/identifier IPs
The document discusses cloud computing and coordination of cloud applications using ZooKeeper. It provides an overview of challenges for cloud computing, architectural styles like client-server and REST, and workflows involving coordination of multiple activities. It then describes ZooKeeper as a distributed coordination service that implements consensus using Paxos. ZooKeeper provides reliable coordination through a replicated database, atomic broadcasts, and guarantees like sequential consistency.
The document discusses Juniper's WANDL and NorthStar solutions for network operators. It provides an overview of the key capabilities of each solution, including:
- WANDL's IP/MPLS View allows operators to design, plan, monitor and optimize multi-vendor Layer 3 networks. It provides network modeling, traffic analysis and automated provisioning capabilities.
- NorthStar combines WANDL's path computation with Juniper's dynamic IP control plane to enable stateful traffic engineering. It provides optimized routing using a centralized path computation approach.
- Both solutions help operators improve network performance, redundancy and efficiency through capabilities like failure simulation, capacity planning, high availability assessment and traffic engineering.
Efficient node bootstrapping for decentralised shared-nothing Key-Value StoresHan Li
This slide was presented in ACM/IFIP/USENIX Middleware 2013, for the paper of "Efficient node bootstrapping for decentralised shared-nothing Key-Value Stores". Abstract of the paper is shown below.
Abstract. Distributed key-value stores (KVSs) have become an important component for data management in cloud applications. Since resources can be provisioned on demand in the cloud, there is a need for efficient node bootstrapping and decommissioning, i.e. to incorporate or eliminate the provisioned resources as a members of the KVS. It requires the data be handed over and the load be shifted across the nodes quickly. However, the data partitioning schemes in the current-state shared nothing KVSs are not efficient in quick bootstrapping. In this paper, we have designed a middleware layer that provides a decentralised scheme of auto-sharding with a two-phase bootstrapping. We experimentally demonstrate that our scheme reduces bootstrap time and improves load-balancing thereby increasing scalability of the KVS.
Scheduling Algorithms in LTE and Future Cellular NetworksINDIAN NAVY
1) The document discusses scheduling algorithms in LTE and future cellular networks. It provides an overview of key concepts like OFDMA, MIMO, small cells, and the essential elements of LTE including resource blocks and transport channels.
2) It describes important scheduling algorithms used in LTE like proportional fair, round robin, best CQI, and algorithms that consider QoS. The objectives and benefits of different algorithms are explained.
3) Future cellular networks will require capabilities like very high data rates, low latency, and support for applications involving AI, M2M communication, and cloud computing. 5G networks will need to meet requirements like low power consumption and worldwide connectivity.
The document describes a proposed unified algorithm for load balancing (LB) and handover optimization (HOO) in Long-Term Evolution (LTE) networks. The algorithm uses a Fuzzy System (FS) tuned by the Q-Learning reinforcement learning algorithm to modify handover parameters at the cell adjacency level. This aims to improve key performance indicators related to both LB and HOO. Simulation results show the proposed joint algorithm provides better performance than independent LB and HOO entities operating simultaneously. The algorithm reduces complexity for the self-organizing network coordination entity by handling LB and HOO jointly rather than as separate functions.
Network Processing on an SPE Core in Cell Broadband EngineTMSlide_N
This document discusses implementing network processing on a Synergistic Processing Element (SPE) core in a Cell Broadband Engine. The key points are:
1) A network interface driver and small protocol stack were implemented on a single SPE to avoid bottlenecks from using the general purpose PowerPC core for network processing.
2) Network processing was able to achieve near wire-speed performance of 8.5 Gbps for TCP and almost wire-speed for UDP, requiring no assistance from the PowerPC core during data transfer.
3) Dedicating an SPE core for network processing can help resolve performance issues from high-speed network interfaces by offloading the processing costs from the general purpose core.
We leave in the era where the atomic building elements of silicon computers, e.g., transistors and wires, are no longer visible using traditional optical microscopes and their sizes are measured in just tens of Angstroms. In addition, power dissipation per unit volume is bounded by the laws of Physics that all resulted among others in stagnating processor clock frequencies. Adding more and more processor cores that perform simpler and simpler tasks in an attempt to efficiently fill the available on-chip area seems to be the current trend taken by the Industry.
Analyzing Data Movements and Identifying Techniques for Next-generation Networksbalmanme
Jan 28th, 2013 - 10:00 am
UC Davis
Title: Analyzing Data Movements and Identifying Techniques for Next-generation Networks
Abstract: Large bandwidth provided by today’s networks requires careful evaluation in order to eliminate system overheads and to bring anticipated high performance to the application layer. As a part of the Advance Network Initiative (ANI) project, we have conducted a large number of experiments in the initial evaluation of the 100Gbps network prototype.
We needed intense fine-tuning, both in network and application layers, to take advantage of the higher network capacity. Instead of explicit improvements in every application as we keep changing the underlying link technology, we require novel data movement mechanisms and abstract layers for end-to-end processing of data. Based on our experience in 100Gbps network, we have developed an experimental prototype, called MemzNet: Memory-mapped Zero-copy Network Channel. MemzNet def ines new data access methods in which applications map memory blocks for remote data, in contrast to the send/receive semantics. In one of the early demonstrations of 100Gbps network applications, we used the initial implementation of MemzNet that takes the approach of aggregating files into blocks and providing dynamic data channel management. We observed that MemzNet showed better results in terms of performance and efficiency,
than the current state-of-the-art file-centric data transfer tools for the transfer of climate datasets with many small files. In this talk, I will mainly describe our experience in 100Gbps tests and present results from the 100Gbps demonstration. I will briefly explain the ANI testbed environment and highlight future research plans.
Bio: Mehmet Balman is a researcher working as a computer engineer in the Computational Research Division at Lawrence Berkeley National Laboratory. His recent work
particularly deals with efficient data transfer mechanisms, high-performance network protocols, bandwidth reservation, network virtualization, scheduling and resource management for large-scale applications. He received his doctoral degree in computer science from Louisiana State University (LSU) in 2010. He has several years of industrial experience as system administrator and R&D specialist, at various software companies before joining LSU. He also worked as a summer intern in Los Alamos National Laboratory.
This document discusses streaming exascale data over 100Gbps networks for climate science applications. It summarizes that:
1) Data volume is increasing exponentially for climate applications, posing challenges for data management.
2) Streaming climate simulation data, which consists of small and large irregularly sized files, efficiently over high-bandwidth networks could benefit climate science.
3) A framework called MemzNet was developed to efficiently move climate files over 100Gbps networks by decoupling I/O and networking operations and dynamically managing data transfer. MemzNet was able to saturate a 100Gbps testbed network.
Improving Passive Packet Capture : Beyond Device PollingHargyo T. Nugroho
The document discusses improving passive packet capture performance beyond device polling. It proposes a "Socket Ring" approach using PF_RING to create a ring buffer on the network interface card driver. This allows captured packets to bypass the kernel and be directly accessed by userspace applications via memory mapping, improving performance over traditional approaches. Experimental results found the PF_RING approach captured packets much faster than Linux's standard approach, especially for medium and large packets, though some packets were still lost. The approach requires a real-time kernel patch and performance is ultimately limited by network drivers and how the kernel fetches packets.
This technical whitepaper compares Aspera FASP, a high-speed transport protocol, to alternative TCP-based and UDP-based file transfer technologies. It finds that while TCP and high-speed TCP variants can improve throughput over standard TCP in low-loss networks, their performance degrades significantly in wide-area networks with higher latency and packet loss. UDP-based solutions also struggle to achieve high throughput and efficiency across different network conditions due to poor congestion control. In contrast, Aspera FASP is able to achieve maximum throughput that is independent of network characteristics like latency and packet loss, making it optimal for reliable, high-speed transfer of large files over IP networks.
DWDM-RAM:Enabling Grid Services with Dynamic Optical NetworksTal Lavian Ph.D.
Packet-switching technology
Great solution for small-burst communication, such as email, telnet, etc.
Data-intensive grid applications
Involves moving massive amounts of data
Requires high and sustained bandwidth
DWDM
Basically circuit switching
Enable QoS at the Physical Layer
Provide
High bandwidth
Sustained bandwidth
DWDM based on dynamic wavelength switching
Enable dedicated optical paths to be allocated dynamically
A Whole Lot of Ports: Juniper Networks QFabric System AssessmentJuniper Networks
Juniper Networks commissioned Network Test to assess the performance, interoperability, and usability of its QFabric System, a converged switch fabric for cloud and large data center applications tested with 1,536 10-Gbit/s Ethernet ports.
Even at this unprecedented scale – by far the largest ever in a public switch test – this project loaded the QFabric System to only one-quarter of its maximum capacity of 6,144 10-Gbit/s Ethernet ports.
Using industry-standard RFC benchmarks representing the most rigorous possible test cases, engineers stress-tested QFabric System performance in terms of unicast and multicast throughput and latency with separate events for Layer 2 and Layer 3 traffic. Engineers also assessed interoperability, a key consideration when adding QFabric technology incrementally into existing data center networks, and evaluated device management.
Improving Efficiency of Machine Learning Algorithms using HPCC SystemsHPCC Systems
1) The document discusses improving the efficiency of machine learning algorithms using the HPCC Systems platform through parallelization.
2) It describes the HPCC Systems architecture and its advantages for distributed machine learning.
3) A parallel DBSCAN algorithm is implemented on the HPCC platform which shows improved performance over the serial algorithm, with execution times decreasing as more nodes are used.
This document summarizes the DevoFlow paper, which proposes techniques to scale flow management for high-performance networks. It finds that per-flow management in OpenFlow introduces high overheads. DevoFlow aims to balance network control, statistics collection, and switch overhead by devolving most flow control to switches while maintaining partial visibility of significant flows. Simulation results show DevoFlow can reduce flow scheduling overheads compared to per-flow control, while still achieving high performance.
Grid optical network service architecture for data intensive applicationsTal Lavian Ph.D.
Integrated SW System Provide the “Glue”
Dynamic optical network as a fundamental Grid service in data-intensive Grid application, to be scheduled, to be managed and coordinated to support collaborative operations
From Super-computer to Super-network
In the past, computer processors were the fastest part
peripheral bottlenecks
In the future optical networks will be the fastest part
Computer, processor, storage, visualization, and instrumentation - slower "peripherals”
eScience Cyber-infrastructure focuses on computation, storage, data, analysis, Work Flow.
The network is vital for better eScience
Virtualization in 4-4 1-4 Data Center Network.Ankita Mahajan
4-4 1-4 delivers great performance guarantees in traditional (non-virtualized) setting, due to location based static IP address allocation to all network elements.
Download this ppt first and then open in powerpoint to view without merged figures and with animations.
The document provides an overview of several data center network architectures: Monsoon, VL2, SEATTLE, PortLand, and TRILL. Monsoon proposes a large layer 2 domain with a Clos topology and uses MAC-in-MAC encapsulation and load balancing to improve scalability. VL2 also uses a Clos topology with flat addressing, load balancing, and an end host directory for address resolution. SEATTLE employs flat addressing, automated host discovery, and hash-based address resolution. PortLand uses a tree topology with encoded switch positions and a fabric manager for address mapping. TRILL standardizes encapsulation and IS-IS routing between routing bridges.
VL2: A scalable and flexible Data Center NetworkAnkita Mahajan
This Data Center Network Architecture introduces a virtual layer 2.5 in the protocol stack of hosts and uses a directory service to achieve efficient forwarding. It uses separate location/identifier IPs
The document discusses cloud computing and coordination of cloud applications using ZooKeeper. It provides an overview of challenges for cloud computing, architectural styles like client-server and REST, and workflows involving coordination of multiple activities. It then describes ZooKeeper as a distributed coordination service that implements consensus using Paxos. ZooKeeper provides reliable coordination through a replicated database, atomic broadcasts, and guarantees like sequential consistency.
The document discusses Juniper's WANDL and NorthStar solutions for network operators. It provides an overview of the key capabilities of each solution, including:
- WANDL's IP/MPLS View allows operators to design, plan, monitor and optimize multi-vendor Layer 3 networks. It provides network modeling, traffic analysis and automated provisioning capabilities.
- NorthStar combines WANDL's path computation with Juniper's dynamic IP control plane to enable stateful traffic engineering. It provides optimized routing using a centralized path computation approach.
- Both solutions help operators improve network performance, redundancy and efficiency through capabilities like failure simulation, capacity planning, high availability assessment and traffic engineering.
Efficient node bootstrapping for decentralised shared-nothing Key-Value StoresHan Li
This slide was presented in ACM/IFIP/USENIX Middleware 2013, for the paper of "Efficient node bootstrapping for decentralised shared-nothing Key-Value Stores". Abstract of the paper is shown below.
Abstract. Distributed key-value stores (KVSs) have become an important component for data management in cloud applications. Since resources can be provisioned on demand in the cloud, there is a need for efficient node bootstrapping and decommissioning, i.e. to incorporate or eliminate the provisioned resources as a members of the KVS. It requires the data be handed over and the load be shifted across the nodes quickly. However, the data partitioning schemes in the current-state shared nothing KVSs are not efficient in quick bootstrapping. In this paper, we have designed a middleware layer that provides a decentralised scheme of auto-sharding with a two-phase bootstrapping. We experimentally demonstrate that our scheme reduces bootstrap time and improves load-balancing thereby increasing scalability of the KVS.
Scheduling Algorithms in LTE and Future Cellular NetworksINDIAN NAVY
1) The document discusses scheduling algorithms in LTE and future cellular networks. It provides an overview of key concepts like OFDMA, MIMO, small cells, and the essential elements of LTE including resource blocks and transport channels.
2) It describes important scheduling algorithms used in LTE like proportional fair, round robin, best CQI, and algorithms that consider QoS. The objectives and benefits of different algorithms are explained.
3) Future cellular networks will require capabilities like very high data rates, low latency, and support for applications involving AI, M2M communication, and cloud computing. 5G networks will need to meet requirements like low power consumption and worldwide connectivity.
The document describes a proposed unified algorithm for load balancing (LB) and handover optimization (HOO) in Long-Term Evolution (LTE) networks. The algorithm uses a Fuzzy System (FS) tuned by the Q-Learning reinforcement learning algorithm to modify handover parameters at the cell adjacency level. This aims to improve key performance indicators related to both LB and HOO. Simulation results show the proposed joint algorithm provides better performance than independent LB and HOO entities operating simultaneously. The algorithm reduces complexity for the self-organizing network coordination entity by handling LB and HOO jointly rather than as separate functions.
Network Processing on an SPE Core in Cell Broadband EngineTMSlide_N
This document discusses implementing network processing on a Synergistic Processing Element (SPE) core in a Cell Broadband Engine. The key points are:
1) A network interface driver and small protocol stack were implemented on a single SPE to avoid bottlenecks from using the general purpose PowerPC core for network processing.
2) Network processing was able to achieve near wire-speed performance of 8.5 Gbps for TCP and almost wire-speed for UDP, requiring no assistance from the PowerPC core during data transfer.
3) Dedicating an SPE core for network processing can help resolve performance issues from high-speed network interfaces by offloading the processing costs from the general purpose core.
We leave in the era where the atomic building elements of silicon computers, e.g., transistors and wires, are no longer visible using traditional optical microscopes and their sizes are measured in just tens of Angstroms. In addition, power dissipation per unit volume is bounded by the laws of Physics that all resulted among others in stagnating processor clock frequencies. Adding more and more processor cores that perform simpler and simpler tasks in an attempt to efficiently fill the available on-chip area seems to be the current trend taken by the Industry.
Analyzing Data Movements and Identifying Techniques for Next-generation Networksbalmanme
Jan 28th, 2013 - 10:00 am
UC Davis
Title: Analyzing Data Movements and Identifying Techniques for Next-generation Networks
Abstract: Large bandwidth provided by today’s networks requires careful evaluation in order to eliminate system overheads and to bring anticipated high performance to the application layer. As a part of the Advance Network Initiative (ANI) project, we have conducted a large number of experiments in the initial evaluation of the 100Gbps network prototype.
We needed intense fine-tuning, both in network and application layers, to take advantage of the higher network capacity. Instead of explicit improvements in every application as we keep changing the underlying link technology, we require novel data movement mechanisms and abstract layers for end-to-end processing of data. Based on our experience in 100Gbps network, we have developed an experimental prototype, called MemzNet: Memory-mapped Zero-copy Network Channel. MemzNet def ines new data access methods in which applications map memory blocks for remote data, in contrast to the send/receive semantics. In one of the early demonstrations of 100Gbps network applications, we used the initial implementation of MemzNet that takes the approach of aggregating files into blocks and providing dynamic data channel management. We observed that MemzNet showed better results in terms of performance and efficiency,
than the current state-of-the-art file-centric data transfer tools for the transfer of climate datasets with many small files. In this talk, I will mainly describe our experience in 100Gbps tests and present results from the 100Gbps demonstration. I will briefly explain the ANI testbed environment and highlight future research plans.
Bio: Mehmet Balman is a researcher working as a computer engineer in the Computational Research Division at Lawrence Berkeley National Laboratory. His recent work
particularly deals with efficient data transfer mechanisms, high-performance network protocols, bandwidth reservation, network virtualization, scheduling and resource management for large-scale applications. He received his doctoral degree in computer science from Louisiana State University (LSU) in 2010. He has several years of industrial experience as system administrator and R&D specialist, at various software companies before joining LSU. He also worked as a summer intern in Los Alamos National Laboratory.
This document discusses streaming exascale data over 100Gbps networks for climate science applications. It summarizes that:
1) Data volume is increasing exponentially for climate applications, posing challenges for data management.
2) Streaming climate simulation data, which consists of small and large irregularly sized files, efficiently over high-bandwidth networks could benefit climate science.
3) A framework called MemzNet was developed to efficiently move climate files over 100Gbps networks by decoupling I/O and networking operations and dynamically managing data transfer. MemzNet was able to saturate a 100Gbps testbed network.
A Platform for Data Intensive Services Enabled by Next Generation Dynamic Opt...Tal Lavian Ph.D.
The new architecture is proposed for data intensive enabled by next generation dynamic optical networks
Offers a Lambda scheduling service over Lambda Grids
Supports both on-demand and scheduled data retrieval
Supports bulk data-transfer facilities using lambda-switched networks
Provides a generalized framework for high performance applications over next generation networks, not necessary optical end-to-end
Supports out-of-band tools for adaptive placement of data replicas
An Architecture for Data Intensive Service Enabled by Next Generation Optical...Tal Lavian Ph.D.
DWDM-RAM - An architecture for data intensive Grids enabled by next generation dynamic optical networks, incorporating new methods for lightpath provisioning.
DWDM-RAM: An architecture designed to meet the
networking challenges of extremely large scale Grid applications.
Traditional network infrastructure cannot meet these demands,
especially, requirements for intensive data flows
DWDM-RAM Components Include:
Data management services
Intelligent middleware
Dynamic lightpath provisioning
State-of-the-art photonic technologies
Wide-area photonic testbed implementation
Enhancing Performance with Globus and the Science DMZGlobus
ESnet has led the way in helping national facilities—and many other institutions in the research community—configure Science DMZs and troubleshoot network issues to maximize data transfer performance. In this talk we will present a summary of approaches and tips for getting the most out of your network infrastructure using Globus Connect Server.
This document discusses data placement scheduling between distributed repositories. It introduces Stork, a batch scheduler for data placement activities that supports plug-in data transfer modules and scheduling of data movement jobs. The document discusses techniques used by Stork such as throttling concurrent transfers, fault tolerance, job aggregation, and adaptive tuning of data transfer protocols. It also covers topics like network reservation, failure awareness, and directions for future work including priority-based scheduling and advance resource reservation.
The Challenges of SDN/OpenFlow in an Operational and Large-scale NetworkOpen Networking Summits
Jun Bi
Professor & Director
Tsinghua University
Outline
• Intra-AS (campus level) IPv6 source address validation using OpenFlow (with extension)
– Good for introducing new IP services to network
• Planning next step if we run SDN as a common infrastructure for new services and architectures
– Some personal viewpoints and thoughts on design challenges
– Forwarding abstraction for Post-IP architectures
– Control abstraction for scalable NOS and programmable/manageable virtualization platform
– Inter-AS policies negotiation abstraction
ONS2015: http://bit.ly/ons2015sd
ONS Inspire! Webinars: http://bit.ly/oiw-sd
Watch the talk (video) on ONS Content Archives: http://bit.ly/ons-archives-sd
High performace network of Cloud Native Taiwan User GroupHungWei Chiu
The document discusses high performance networking and summarizes a presentation about improving network performance. It describes drawbacks of the current Linux network stack, including kernel overhead and data copying. It then discusses approaches like DPDK and RDMA that can help improve performance by reducing overhead and enabling zero-copy data transfers. A case study is presented on using RDMA to improve TensorFlow performance by eliminating unnecessary data copies between devices.
This document discusses optimizing Linux AMIs for performance at Netflix. It begins by providing background on Netflix and explaining why tuning the AMI is important given Netflix runs tens of thousands of instances globally with varying workloads. It then outlines some of the key tools and techniques used to bake performance optimizations into the base AMI, including kernel tuning to improve efficiency and identify ideal instance types. Specific examples of CFS scheduler, page cache, block layer, memory allocation, and network stack tuning are also covered. The document concludes by discussing future tuning plans and an appendix on profiling tools like perf and SystemTap.
2009-01-28 DOI NBC Red Hat on System z Performance ConsiderationsShawn Wells
Presented with the U.S. Department of the Interior, National Business Center. DOI NBC offered a for-fee Linux on System z to the U.S. Government. This presentation steps through performance management considerations, including: FCP/SCSI single path vs multipath LMV; filesystem striping; crypto express2 accelerator (CEX2A) SSL handshakes; cryptographic performance (WebSEAL SSL Access); and CMM1 & CMMA.
- James Blessing is the Deputy Director of Network Architecture at Future Services. He discussed Ciena's MCP network management software, the need for automation of network provisioning through APIs, and the JiscMail NETWORK-AUTOMATION mailing list as a resource.
- The document then covered topics like Netpath services, layer 2 and 3 VPNs, network function virtualization, IPv6 adoption, the Janet end-to-end performance initiative, science DMZ principles, network performance monitoring with perfSONAR, and working with the GÉANT project.
Stephan Ewen - Experiences running Flink at Very Large ScaleVerverica
This talk shares experiences from deploying and tuning Flink steam processing applications for very large scale. We share lessons learned from users, contributors, and our own experiments about running demanding streaming jobs at scale. The talk will explain what aspects currently render a job as particularly demanding, show how to configure and tune a large scale Flink job, and outline what the Flink community is working on to make the out-of-the-box for experience as smooth as possible. We will, for example, dive into - analyzing and tuning checkpointing - selecting and configuring state backends - understanding common bottlenecks - understanding and configuring network parameters
DNMTT - Synchrophasor Data Delivery Efficiency GEP Testing Results at Peak RCGrid Protection Alliance
GEP was tested against IEEE C37.118 for wide-area distribution of phasor data. Results showed that GEP had much less data loss than C37.118 over the same network conditions. GEP also required 60-70% less bandwidth for large and medium data flows compared to C37.118. There was no significant impact on servers between the two protocols. In conclusion, GEP represents an improved target for high-volume synchrophasor data distribution due to its robust and scalable pub/sub design.
Kickstart your Kafka with Faker Data | Francesco Tisiot, Aiven.ioHostedbyConfluent
We all love to play with the shiny toys, but an event stream with no events is a sorry sight. In this session you’ll see how to create your own streaming dataset for Apache Kafka using Python and the Faker library. You’ll learn how to create a random data producer and define the structure and rate of its message delivery. Randomly-generated data is often hilarious in its own right, and it adds just the right amount of fun to any Kafka and its integrations!
A Platform for Large-Scale Grid Data Service on Dynamic High-Performance Netw...Tal Lavian Ph.D.
Dynamic High-Performance Networks :
Support data-intensive Grid applications
Gives adequate and uncontested bandwidth to an application’s burst
Employs circuit-switching of large flows of data to avoid overheads in breaking flows into small packets and delays routing
Is capable of automatic end-to-end path provisioning
Is capable of automatic wavelength switching
Provides a set of protocols for managing dynamically provisioned wavelengths
DWDM-RAM :
Encapsulates “optical network resources” into a service framework to support dynamically provisioned and advanced data-intensive transport services
Offers network resources as Grid services for Grid computing
Allows cooperation of distributed resources
Provides a generalized framework for high performance applications over next generation networks, not necessary optical end-to-end
Yields good overall utilization of network resources
Simulating the behavior of satellite Internet links to small islandsAPNIC
This document summarizes a talk about simulating satellite internet links to small islands using a hardware-based simulation. The simulation aims to demonstrate how coding and performance enhancing proxies impact link utilization and packet loss. It consists of configuring the simulated satellite link parameters, running background traffic from servers to clients to generate demand, capturing traffic on both ends, and measuring the impact of coding and proxies on large file transfers and ping times. Preliminary results show that medium earth orbit links have higher goodput than geostationary links under high load, and that performance enhancing proxies help large file transfers without significantly impacting overall throughput. Future work will explore forward error correction coding and balancing redundancy with spare capacity.
DPDK Summit 2015 - Aspera - Charles ShiflettJim St. Leger
DPDK Summit 2015 in San Francisco.
Presentation by Charles Shiflett, Aspera.
For additional details and the video recording please visit www.dpdksummit.com.
This tutorial gives out an brief and interesting introduction to modern stream computing technologies. The participants can learn the essential concepts and methodologies for designing and building a advanced stream processing system. The tutorial unveils the key fundamentals behind various kinds of design choices. Some forecast of technology developments in this domain is also introduced at the last section of this tutorial.
Impact of Grid Computing on Network Operators and HW VendorsTal Lavian Ph.D.
The “Network” is a Prime Resource for Large- Scale Distributed System.
Integrated SW System Provide the “Glue”
Dynamic optical network as a fundamental Grid service in data-intensive Grid application, to be scheduled, to be managed and coordinated to support collaborative operations
The document discusses a 100Gbps testbed network built by the Energy Sciences Network (ESnet) to support the transfer of massive datasets used in scientific research. The network connects three supercomputing centers and has been 80% utilized since launching in January. A project used the network to demonstrate a 35 terabyte transfer between two sites that took 30 minutes, compared to an estimated 5 hours on a 10Gbps network. The 100Gbps network provides scientists with critical infrastructure to enable research as datasets continue growing rapidly in size.
A 100 gigabit highway for science: researchers take a 'test drive' on ani tes...balmanme
The document discusses the development of the Advanced Networking Initiative (ANI), a 100 Gbps national prototype network and testbed established by the Department of Energy's Energy Sciences Network (ESnet) to support scientific research. Researchers from various fields have used the ANI testbed to test networking technologies and data transfer tools for moving extremely large datasets, such as climate simulation data and radio astronomy data. The testbed has helped researchers optimize their software and protocols for high-speed data transfer over long-distance 100 Gbps networks.
This document summarizes a presentation on Stork 1.0 and beyond for large-scale collaborative science. Stork is a framework for scheduling data placement jobs. It uses modular transfer modules to support different protocols and services. It also includes features like error detection and classification, dynamic tuning of transfer parameters, job aggregation for improved performance, and has been used for data migration in projects like PetaShare. Future work may include improving performance and fault tolerance through distributed scheduling agents.
Available technologies: algorithm for flexible bandwidth reservations for dat...balmanme
Scientists at Berkeley Lab developed a flexible reservation algorithm that finds communication paths in time-dependent networks with bandwidth constraints. The algorithm offers reservation options that meet the user's specified requirements for start time, transit time, and bandwidth. It was tested in network simulations and can produce reservation options in under a second for networks with 1000 nodes. The algorithm provides more flexibility than existing reservation systems and allows users to optimize their choices for large-scale data transfers.
Berkeley lab team develops flexible reservation algorithm for advance network...balmanme
Researchers at Berkeley Lab developed a new flexible reservation algorithm to help scientists transfer large datasets over networks more efficiently. The algorithm allows users to inquire about bandwidth availability and receive alternative reservation options when initial requests fail. It presents a variety of possible reservation options to choose from based on factors like earliest completion time or highest bandwidth. This flexible approach is being integrated into ESnet's reservation system, OSCARS, to better support the large-scale data needs of scientific research.
This document discusses dynamic adaptation techniques for optimizing data transfer performance over networks. It describes how the number of concurrent data transfer streams can be adjusted dynamically according to changing network conditions, without relying on historical measurements or external profiling. The proposed approach gradually increases the level of parallelism during a transfer to find a near-optimal number of streams based on instant throughput measurements, allowing it to adapt to varying environments and network utilization over time.
This document discusses data migration in distributed repositories for collaborative science. It describes STORK, a scheduler for data placement activities that dynamically adapts data transfers. STORK aggregates data placement jobs and processes them as a single transfer job to improve performance. It also dynamically sets the number of parallel streams for transfers based on network characteristics. The document presents experiments on the Louisiana Optical Network Initiative demonstrating how STORK optimizes parameters like aggregation count and parallel jobs to reduce total transfer time.
1) Scientific research increasingly relies on large-scale data transfers between collaborating institutions over high-speed networks.
2) ESNet provides high-bandwidth connectivity between DOE sites but needs to efficiently allocate guaranteed bandwidth for transfers.
3) The presentation proposes enhancements to ESNet's OSCARS reservation system to suggest optimal reservations that meet researchers' bandwidth and timing requirements for data transfers.
Berkeley Lab - Computing Sciences Seminar - Reminder
TOMORROW, June 24, 2:00pm - 3:00pm, Bldg. 50F, Room 1647
Berkeley Lab - Computing Sciences Seminar
*/Date/:*
Wednesday, June 24, 2009
*/Time/:*
2:00pm - 3:00pm
*/Location/:*
Bldg. 50F, Room 1647
*/Speaker/:*
Mehmet Balman
Department of Computer Science
Louisiana State University
*/Title/:*
Data Migration between Distributed Repositories for Collaborative
Research
*/Abstract/:*
Scientific applications especially in several areas such as physics,
biology, and astronomy have become more complex and compute
intensive. Often, such applications require geographically
distributed resources to satisfy their immense computational
requirements. Consequently, these applications also have increasing
distributed data intensive requirements, dealing with petabytes of
data. The distributed nature of the resources made data movement
the major bottleneck for end-to-end application performance. Our
approach is to use a dynamic network layer where data placement
middleware needs to adapt to the changing conditions in the
environment. Furthermore, heterogeneous resource and different data
access and security protocols are some of the challenges the data
placement middleware needs to deal with. Complex middleware is
required to orchestrate the use of these storage and network
resources between collaborating parties, and to manage the
end-to-end distribution of data.
We present a data placement scheduler, for mitigating the data
bottleneck in collaborative peta-scale applications. In this talk,
we will give details on recent research in data scheduling, some use
cases for transferring very large data sets into distributed
repositories, and experiments of effective data movement over 1Gpbs
and 10Gbps networks. We will also describe advanced features
including aggregation of data placement jobs with small data files,
dynamic tuning of data transfer operations to minimize the effect of
network latency, error detection and classification, and restarting
transfer operations after transfer interruptions.
*/Host of Seminar/: *
Arie Shoshani
------------------------------------------------------------------------
*/For additional information, such as site access or directions to the
conference room, please contact CSSeminars-Help@hpcrd.lbl.gov
<mailto:csseminars-help@hpcrd.lbl.gov>./*
*/Web Contact: CSSeminars-Help@hpcrd.lbl.govREMINDER:
<mailto:csseminars-help@hpcrd.lbl.gov>/*
_______________________________________________
CSSeminars mailing list
CSSeminars@hpcrdm.lbl.gov
https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/csseminars
Balman dissertation Copyright @ 2010 Mehmet Balmanbalmanme
This document discusses scheduling data transfer operations with advance reservation and provisioning. It proposes dividing time into windows where network bandwidth availability is stable. When a data transfer request is received, the scheduler checks all possible time windows to see if the request can fit within bandwidth constraints. If no window is available, it tries shifting existing transfers to earlier windows if they have less "desire" based on number of occupied time slots and order of the window. This allows requests to be scheduled in advance while minimizing disruption to existing transfers.
From: "Rachel Lance" <rlance@lbl.gov>
Subject: [CSSeminars] REMINDER: Berkeley Lab - Computing Sciences Seminar - Monday, 8/17/2009, 2:00pm TODAY
Date: Mon, August 17, 2009 1:36 pm
To: CSSeminars@hpcrd.lbl.gov
Berkeley Lab - Computing Sciences Seminar - Reminder
TODAY, August 17, 2:00pm - 3:00pm, Bldg. 50F, Room 1647
Berkeley Lab - Computing Sciences Seminar
*/Date/:*
Monday, August 17, 2009
*/Time/:*
2:00pm - 3:00pm
*/Location/:*
Bldg. 50F, Room 1647
*/Speaker/:*
Mehmet Balman
Department of Computer Science
Louisiana State University
*/Title/:*
Advance Network Reservation and Provisioning for Science
*/Abstract/:*
Scientific applications already generate many terabytes and even
petabytes of data from supercomputer runs and large-scale
experiments. The need for transferring data chunks of
ever-increasing sizes through the network shows no sign of abating.
Hence, we need high-bandwidth high speed networks such as DoE's
ESnet (Energy Sciences Network) that manage the available bandwidth
effectively. OSCARS (ESnet On-demand Secure Circuits and Advance
Reservation System) serves as the network provisioning agent on
ESnet. Currently, using OSCARS, a user can specify a desired
bandwidth reservation of bandwidth x MB/sec for a duration y hours
starting at time t. OSCARS checks network availability and capacity
for the specified window of time, and allocates it for that user if
it is available. Otherwise, it reports to the user that it is unable
to do the allocation. Accordingly, it falls upon the user to search
for a time-frame of a required bandwidth by trial-and-error, not
having knowledge of the network's available capacity at a certain
instant of time. We report a novel algorithm, where the user
specifies the total volume that needs to be transferred, a maximum
bandwidth that he/she can use, and a desired time window within
which the transfer should be done. The proposed algorithm can find
alternate allocation possibilities,including earliest time for
completion, or shortest transfer duration - leaving the choice to
the user. The proposed algorithm is quite practical when applied to
large networks with thousands of routers and links. We have
implemented our algorithm for testing and incorporation into a
future version of OSCARS. We will finish the talk with a short
demonstration.
*/Host of Seminar/: *
Arie Shoshani
-----------
1) The Earth System Grid (ESG) supports climate research by providing access to petabytes of climate simulation data distributed across multiple locations worldwide. 2) As climate datasets continue increasing in size, from gigabytes to petabytes, efficient bulk data transfer techniques are needed to replicate and distribute the data. 3) The Bulk Data Mover (BDM) was developed to improve data transfer performance. It uses techniques like parallel TCP streams, adaptive tuning of transfer parameters, and dynamic load balancing.
The document summarizes the agenda for the NDM 2012 workshop on Network-aware Data Management to be held on November 11th, 2012 in Salt Lake City. The workshop will include keynote speeches, invited talks, paper presentations, and a panel discussion on new directions in networking and data management. Topics will include data-intensive applications, transport of big data over dedicated networks, and using networking techniques for data management. The workshop aims to foster collaboration between the network and data management communities.
The document proposes a flexible reservation algorithm to improve advance network reservation systems. It allows clients to specify maximum bandwidth, data size, earliest start time, and latest end time. The system then finds the reservation that meets these constraints with either the earliest completion time or shortest transfer duration. Time-dependent graphs are used to model bandwidth availability over time. Algorithms like Kruskal and Dijkstra's are modified to find the maximum bandwidth path while respecting constraints like earliest completion.
The document discusses analyzing climate data over fast networks and parallel mesh refinement. It describes two climate analysis applications that are either computationally or data intensive. It then discusses accessing netcdf climate data files from remote repositories over networks, distributing the input files across processes, and using batch processing or clouds to retrieve the remote data. It also describes adaptive mesh refinement used to process large climate data in parallel by distributing the mesh and synchronizing propagation paths between processes.
The document summarizes a workshop on network-aware data management held alongside the SC'11 conference. The workshop addressed challenges in managing large amounts of data across high-bandwidth networks and how to simplify data access and movement. It included keynote and paper presentations on these topics, and a panel discussion on data management challenges for exascale computing and terabit networks. The best paper award was given to a presentation on a fat-tree routing algorithm to alleviate congestion in InfiniBand networks.
Climate100 aims to scale climate applications to utilize 100Gbps network bandwidth. While climate datasets contain many small files totaling large sizes, recent accomplishments demonstrate efficiently moving terabytes of data between laboratories in under an hour over 100Gbps links, averaging 83Gbps transfer speeds. The project addresses increasing data sizes and efficient use of network infrastructure with limited resources.
The document discusses using RDMA over Converged Ethernet (RoCE) for high performance data movement between KISTI and LBL. It describes how RDMA allows direct data placement through one-sided operations like RDMA write and read, avoiding CPU overhead. It also discusses challenges in using RDMA over wide area networks and for bulk data transfers, and experiments using GridFTP and a prototype FTP-like transfer application over RDMA.
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfPaige Cruz
Monitoring and observability aren’t traditionally found in software curriculums and many of us cobble this knowledge together from whatever vendor or ecosystem we were first introduced to and whatever is a part of your current company’s observability stack.
While the dev and ops silo continues to crumble….many organizations still relegate monitoring & observability as the purview of ops, infra and SRE teams. This is a mistake - achieving a highly observable system requires collaboration up and down the stack.
I, a former op, would like to extend an invitation to all application developers to join the observability party will share these foundational concepts to build on:
Pushing the limits of ePRTC: 100ns holdover for 100 daysAdtran
At WSTS 2024, Alon Stern explored the topic of parametric holdover and explained how recent research findings can be implemented in real-world PNT networks to achieve 100 nanoseconds of accuracy for up to 100 days.
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
How to Get CNIC Information System with Paksim Ga.pptxdanishmna97
Pakdata Cf is a groundbreaking system designed to streamline and facilitate access to CNIC information. This innovative platform leverages advanced technology to provide users with efficient and secure access to their CNIC details.
Essentials of Automations: The Art of Triggers and Actions in FMESafe Software
In this second installment of our Essentials of Automations webinar series, we’ll explore the landscape of triggers and actions, guiding you through the nuances of authoring and adapting workspaces for seamless automations. Gain an understanding of the full spectrum of triggers and actions available in FME, empowering you to enhance your workspaces for efficient automation.
We’ll kick things off by showcasing the most commonly used event-based triggers, introducing you to various automation workflows like manual triggers, schedules, directory watchers, and more. Plus, see how these elements play out in real scenarios.
Whether you’re tweaking your current setup or building from the ground up, this session will arm you with the tools and insights needed to transform your FME usage into a powerhouse of productivity. Join us to discover effective strategies that simplify complex processes, enhancing your productivity and transforming your data management practices with FME. Let’s turn complexity into clarity and make your workspaces work wonders!
Dr. Sean Tan, Head of Data Science, Changi Airport Group
Discover how Changi Airport Group (CAG) leverages graph technologies and generative AI to revolutionize their search capabilities. This session delves into the unique search needs of CAG’s diverse passengers and customers, showcasing how graph data structures enhance the accuracy and relevance of AI-generated search results, mitigating the risk of “hallucinations” and improving the overall customer journey.
Climate Impact of Software Testing at Nordic Testing DaysKari Kakkonen
My slides at Nordic Testing Days 6.6.2024
Climate impact / sustainability of software testing discussed on the talk. ICT and testing must carry their part of global responsibility to help with the climat warming. We can minimize the carbon footprint but we can also have a carbon handprint, a positive impact on the climate. Quality characteristics can be added with sustainability, and then measured continuously. Test environments can be used less, and in smaller scale and on demand. Test techniques can be used in optimizing or minimizing number of tests. Test automation can be used to speed up testing.
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
“An Outlook of the Ongoing and Future Relationship between Blockchain Technologies and Process-aware Information Systems.” Invited talk at the joint workshop on Blockchain for Information Systems (BC4IS) and Blockchain for Trusted Data Sharing (B4TDS), co-located with with the 36th International Conference on Advanced Information Systems Engineering (CAiSE), 3 June 2024, Limassol, Cyprus.
A tale of scale & speed: How the US Navy is enabling software delivery from l...sonjaschweigert1
Rapid and secure feature delivery is a goal across every application team and every branch of the DoD. The Navy’s DevSecOps platform, Party Barge, has achieved:
- Reduction in onboarding time from 5 weeks to 1 day
- Improved developer experience and productivity through actionable findings and reduction of false positives
- Maintenance of superior security standards and inherent policy enforcement with Authorization to Operate (ATO)
Development teams can ship efficiently and ensure applications are cyber ready for Navy Authorizing Officials (AOs). In this webinar, Sigma Defense and Anchore will give attendees a look behind the scenes and demo secure pipeline automation and security artifacts that speed up application ATO and time to production.
We will cover:
- How to remove silos in DevSecOps
- How to build efficient development pipeline roles and component templates
- How to deliver security artifacts that matter for ATO’s (SBOMs, vulnerability reports, and policy evidence)
- How to streamline operations with automated policy checks on container images
UiPath Test Automation using UiPath Test Suite series, part 5DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 5. In this session, we will cover CI/CD with devops.
Topics covered:
CI/CD with in UiPath
End-to-end overview of CI/CD pipeline with Azure devops
Speaker:
Lyndsey Byblow, Test Suite Sales Engineer @ UiPath, Inc.
In the rapidly evolving landscape of technologies, XML continues to play a vital role in structuring, storing, and transporting data across diverse systems. The recent advancements in artificial intelligence (AI) present new methodologies for enhancing XML development workflows, introducing efficiency, automation, and intelligent capabilities. This presentation will outline the scope and perspective of utilizing AI in XML development. The potential benefits and the possible pitfalls will be highlighted, providing a balanced view of the subject.
We will explore the capabilities of AI in understanding XML markup languages and autonomously creating structured XML content. Additionally, we will examine the capacity of AI to enrich plain text with appropriate XML markup. Practical examples and methodological guidelines will be provided to elucidate how AI can be effectively prompted to interpret and generate accurate XML markup.
Further emphasis will be placed on the role of AI in developing XSLT, or schemas such as XSD and Schematron. We will address the techniques and strategies adopted to create prompts for generating code, explaining code, or refactoring the code, and the results achieved.
The discussion will extend to how AI can be used to transform XML content. In particular, the focus will be on the use of AI XPath extension functions in XSLT, Schematron, Schematron Quick Fixes, or for XML content refactoring.
The presentation aims to deliver a comprehensive overview of AI usage in XML development, providing attendees with the necessary knowledge to make informed decisions. Whether you’re at the early stages of adopting AI or considering integrating it in advanced XML development, this presentation will cover all levels of expertise.
By highlighting the potential advantages and challenges of integrating AI with XML development tools and languages, the presentation seeks to inspire thoughtful conversation around the future of XML development. We’ll not only delve into the technical aspects of AI-powered XML development but also discuss practical implications and possible future directions.
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AIVladimir Iglovikov, Ph.D.
Presented by Vladimir Iglovikov:
- https://www.linkedin.com/in/iglovikov/
- https://x.com/viglovikov
- https://www.instagram.com/ternaus/
This presentation delves into the journey of Albumentations.ai, a highly successful open-source library for data augmentation.
Created out of a necessity for superior performance in Kaggle competitions, Albumentations has grown to become a widely used tool among data scientists and machine learning practitioners.
This case study covers various aspects, including:
People: The contributors and community that have supported Albumentations.
Metrics: The success indicators such as downloads, daily active users, GitHub stars, and financial contributions.
Challenges: The hurdles in monetizing open-source projects and measuring user engagement.
Development Practices: Best practices for creating, maintaining, and scaling open-source libraries, including code hygiene, CI/CD, and fast iteration.
Community Building: Strategies for making adoption easy, iterating quickly, and fostering a vibrant, engaged community.
Marketing: Both online and offline marketing tactics, focusing on real, impactful interactions and collaborations.
Mental Health: Maintaining balance and not feeling pressured by user demands.
Key insights include the importance of automation, making the adoption process seamless, and leveraging offline interactions for marketing. The presentation also emphasizes the need for continuous small improvements and building a friendly, inclusive community that contributes to the project's growth.
Vladimir Iglovikov brings his extensive experience as a Kaggle Grandmaster, ex-Staff ML Engineer at Lyft, sharing valuable lessons and practical advice for anyone looking to enhance the adoption of their open-source projects.
Explore more about Albumentations and join the community at:
GitHub: https://github.com/albumentations-team/albumentations
Website: https://albumentations.ai/
LinkedIn: https://www.linkedin.com/company/100504475
Twitter: https://x.com/albumentations
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
Network-aware Data Management for Large Scale Distributed Applications, IBM Research-Almaden, San Jose, CA – June 24, 2015
1. Network-‐aware
Data
Management
for
Large-‐scale
Distributed
Applications
June
24,
2015
Mehmet
Balman
h3p://balman.info
Senior
Performance
Engineer
at
VMware
Inc.
Guest/Affiliate
at
Berkeley
Lab
1
2. About
me:
Ø 2013:
Performance,
Central
Engineering,
VMware,
Palo
Alto,
CA
Ø 2009:
ComputaMonal
Research
Division
(CRD)
at
Lawrence
Berkeley
NaMonal
Laboratory
(LBNL)
Ø 2005:
Center
for
ComputaMon
&
Technology
(CCT),
Baton
Rouge,
LA
v Computer
Science,
Louisiana
State
University
(2010,2008)
v Bogazici
University,
Istanbul,
Turkey
(2006,2000)
Data
Transfer
Scheduling
with
Advance
ReservaMon
and
Provisioning,
Ph.D.
Failure-‐Awareness
and
Dynamic
AdaptaMon
in
Data
Scheduling,
M.S.
Parallel
Tetrahedral
Mesh
Refinement,
M.S.
2
3. Why
Network-‐aware?
Networking
is
one
of
the
major
components
in
many
of
the
soluMons
today
• Distributed
data
and
compute
resources
• CollaboraMon:
data
to
be
shared
between
remote
sites
• Data
centers
are
complex
network
infrastructures
ü What
further
steps
are
necessary
to
take
full
advantage
of
future
networking
infrastructure?
ü How
are
we
going
to
deal
with
performance
problems?
ü How
can
we
enhance
data
management
services
and
make
them
network-‐aware?
New
collabora>ons
between
data
management
and
networking
communi>es.
3
4. Two
major
players:
• AbstracMon
and
Programmability
• Rapid
Development,
Intelligent
services
• OrchestraMng
compute,
storage,
and
network
resources
together
• IntegraMon
and
deployment
of
complex
workflows
• VirtualizaMon
(+containers)
• Distributed
storage
(storage
wars)
• Open
Source
(if
you
can’t
fix
it,
you
don’t
own
it)
• Performance
Gap:
• LimitaMon
in
current
system
so3ware
vs
foreseen
speed:
• Hardware
is
fast,
Sofware
is
slow
• Latency
vs
throughput
mismatch
will
lead
to
new
innovaGons
4
5. Outline
• VSAN
+
VVOL
Storage
Performance
in
Virtualized
Environments
• PetaShare
Distributed
Storage
+
Stork
Data
Scheduler
Adap>ve
Tuning
+
Advanced
Buffers
• Data
Streaming
in
High-‐bandwidth
Networks
• Climate100:
Advance
Network
IniMaMve
and
100Gbps
Demo
• MemzNet:
Memory-‐Mapped
Network
Zero-‐copy
Channels
• Core
Affinity
and
End
System
Tuning
in
High-‐Throughput
Flows
• Network
Reserva>on
and
Online
Scheduling
(QoS)
• FlexRes:
A
Flexible
Network
ReservaMon
Algorithm
• SchedSim:
Online
Scheduling
with
Advance
Provisioning
5
7. VSAN
performance
work
in
a
nutshell
7
Observer
image:
blog.vmware.com
• Every
write
operaMon
needs
to
go
over
network
(and
network
is
not
free)
• Each
layer
(cache,
disk,
object
management,
etc.)
needs
resources
(CPU,
memory)
• Resource
limitaMons
vs
Latency
effect
• Needs
to
support
thousands
of
VMs
Placement
of
Objects:
• Which
Host?
• Which
Disk/SSD
in
the
Host?
What
if
there
are
failures,
migraMons,
and
if
we
need
to
rebalance
8. 8
VVOL:
virtual
volumes
VVOL
image:
blog.vmware.com
Offloading
control
operaMons
to
the
storage
array
• powerOn
• powerOff
• Delete
• clone
9. VVOL
performance
work
• Effect
of
the
latency
in
control
path
•
linked
clone
vs
VVOL
clones
9
Vsphere
Storage
Host
VASA
VP
Data
path
Control
path
• Op>mize
service
latencies
• Batching
(disklib)
• Use
concurrent
opera>ons
10. PetaShare
+
Stork
Data
Scheduler
10
AggregaMon
in
Data
Path:
Advance
Buffer
Cache
in
Petafs
and
Petashell
clients
by
aggregaMng
I/O
requests
to
minimize
the
number
of
network
messages
11. Adaptive
Tuning
+
Advanced
Buffer
11
AdapMve
Tuning
for
Bulk
Transfer
Buffer
Cache
for
Remote
I/O
12. Outline
• VSAN
+
VVOL
Storage
Performance
in
Virtualized
Environments
• PetaShare
Distributed
Storage
+
Stork
Data
Scheduler
Adap>ve
Tuning
+
Advanced
Buffers
• Data
Streaming
in
High-‐bandwidth
Networks
• Climate100:
Advance
Network
IniMaMve
and
100Gbps
Demo
• MemzNet:
Memory-‐Mapped
Network
Zero-‐copy
Channels
• Core
Affinity
and
End
System
Tuning
in
High-‐Throughput
Flows
• Network
Reserva>on
and
Online
Scheduling
(QoS)
• FlexRes:
A
Flexible
Network
ReservaMon
Algorithm
• SchedSim:
Online
Scheduling
with
Advance
Provisioning
12
13. 100Gbps
networking
has
Linally
arrived!
Applica>ons’
Perspec>ve
Increasing
the
bandwidth
is
not
sufficient
by
itself;
we
need
careful
evaluaMon
of
high-‐bandwidth
networks
from
the
applicaMons’
perspecMve.
1Gbps
to
10Gbps
transiMon
(10
years
ago)
ApplicaMon
did
not
run
10
Mmes
faster
because
there
was
more
bandwidth
available
13
14. ANI
100Gbps
Demo
• 100Gbps
demo
by
ESnet
and
Internet2
• ApplicaMon
design
issues
and
host
tuning
strategies
to
scale
to
100Gbps
rates
• VisualizaMon
of
remotely
located
data
(Cosmology)
• Data
movement
of
large
datasets
with
many
files
(Climate
analysis)
14
15. Earth
System
Grid
Federation
(ESGF)
15
• Over
2,700
sites
• 25,000
users
• IPCC
Fifh
Assessment
Report
(AR5)
2PB
• IPCC
Forth
Assessment
Report
(AR4)
35TB
• Remote
Data
Analysis
• Bulk
Data
Movement
17.
lots-‐of-‐small-‐*iles
problem!
*ile-‐centric
tools?
FTP
RPC
request a file
request a file
send file
send file
request
data
send data
• Keep
the
network
pipe
full
• We
want
out-‐of-‐order
and
asynchronous
send
receive
17
18. Many
Concurrent
Streams
(a) total throughput vs. the number of concurrent memory-to-memory transfers, (b) interface traffic, packages per second (blue) and bytes per second, over a single
NIC with different number of concurrent transfers. Three hosts, each with 4 available NICs, and a total of 10 10Gbps NIC pairs were used to saturate the 100Gbps
pipe in the ANI Testbed. 10 data movement jobs, each corresponding to a NIC pair, at source and destination started simultaneously. Each peak represents a
different test; 1, 2, 4, 8, 16, 32, 64 concurrent streams per job were initiated for 5min intervals (e.g. when concurrency level is 4, there are 40 streams in total).
18
19. ANI testbed 100Gbps (10x10NICs, three hosts): Interrupts/CPU vs the number of concurrent transfers [1, 2, 4, 8, 16, 32 64 concurrent jobs - 5min
intervals], TCP buffer size is 50M
Effects
of
many
concurrent
streams
19
20. Analysis
of
Core
AfLinities
(NUMA
Effect)
20
Nathan
Hanford
et
al.
NDM’13
Sandy
Bridge
Architecture
Receive
process
21. 21
Analysis
of
Core
AfLinities
(NUMA
Effect)
Nathan
Hanford
et
al.
NDM’14
25. Advantages
• Decoupling
I/O
and
network
operaMons
• front-‐end
(I/O
processing)
• back-‐end
(networking
layer)
• Not
limited
by
the
characterisMcs
of
the
file
sizes
• On
the
fly
tar
approach,
bundling
and
sending
many
files
together
• Dynamic
data
channel
management
Can
increase/decrease
the
parallelism
level
both
in
the
network
communicaMon
and
I/O
read/write
operaMons,
without
closing
and
reopening
the
data
channel
connecMon
(as
is
done
in
regular
FTP
variants).
MemzNet
is
is
not
file-‐centric.
Bookkeeping
informaMon
is
embedded
inside
each
block.
25
27. 100Gbps
Demo
• CMIP3
data
(35TB)
from
the
GPFS
filesystem
at
NERSC
• Block
size
4MB
• Each
block’s
data
secMon
was
aligned
according
to
the
system
pagesize.
• 1GB
cache
both
at
the
client
and
the
server
• At
NERSC,
8
front-‐end
threads
on
each
host
for
reading
data
files
in
parallel.
•
At
ANL/ORNL,
4
front-‐end
threads
for
processing
received
data
blocks.
•
4
parallel
TCP
streams
(four
back-‐end
threads)
were
used
for
each
host-‐to-‐host
connecMon.
27
28. MemzNet’s
Performance
TCP
buffer
size
is
set
to
50MB
MemzNetGridFTP
100Gbps demo
ANI Testbed
28
29. Challenge?
• High-‐bandwidth
brings
new
challenges!
• We
need
substanMal
amount
of
processing
power
and
involvement
of
mulMple
cores
to
fill
a
40Gbps
or
100Gbps
network
• Fine-‐tuning,
both
in
network
and
applicaMon
layers,
to
take
advantage
of
the
higher
network
capacity.
• Incremental
improvement
in
current
tools?
• We
cannot
expect
every
applicaMon
to
tune
and
improve
every
Mme
we
change
the
link
technology
or
speed.
29
30. MemzNet
• MemzNet:
Memory-‐mapped
Network
Channel
• High-‐performance
data
movement
MemzNet
is
an
iniMal
effort
to
put
a
new
layer
between
the
applicaMon
and
the
transport
layer.
• Main
goal
is
to
define
a
network
channel
so
applicaMons
can
directly
use
it
without
the
burden
of
managing/tuning
the
network
communicaMon.
30
Tech
report:
LBNL-‐6177E
31. MemzNet
=
New
Execution
Model
• Luigi
Rizzo
’s
netmap
• proposes
a
new
API
to
send/receive
data
over
the
network
• RDMA
programming
model
• MemzNet
as
a
memory-‐management
component
• IX:
Data
Plane
OS
(Adam
Baley
et
al.
@standford
–
similar
to
MemzNet’s
model)
• mTCP
(even
based
/
replaces
send/receive
in
user
level)
• Tanenbaum
et
al.
Minimizing
context
switches:
proposing
to
use
MONITOR/MWAIT
for
synchronizaMon
31
32. Outline
• VSAN
+
VVOL
Storage
Performance
in
Virtualized
Environments
• PetaShare
Distributed
Storage
+
Stork
Data
Scheduler
Adap>ve
Tuning
+
Advanced
Buffers
• Data
Streaming
in
High-‐bandwidth
Networks
• Climate100:
Advance
Network
IniMaMve
and
100Gbps
Demo
• MemzNet:
Memory-‐Mapped
Network
Zero-‐copy
Channels
• Core
Affinity
and
End
System
Tuning
in
High-‐Throughput
Flows
• Network
Reserva>on
and
Online
Scheduling
(QoS)
• FlexRes:
A
Flexible
Network
ReservaMon
Algorithm
• SchedSim:
Online
Scheduling
with
Advance
Provisioning
32
33. Problem
Domain:
Esnet’s
OSCARS
33
ASIA-PACIFIC
(ASGC/Kreonet2/
TWAREN)
ASIA-PACIFIC
(KAREN/KREONET2/
NUS-GP/ODN/
REANNZ/SINET/
TRANSPAC/TWAREN)
AUSTRALIA
(AARnet)
LATIN AMERICA
CLARA/CUDI
CANADA
(CANARIE)
RUSSIA
AND CHINA
(GLORIAD)
US R&E
(DREN/Internet2/NLR)
US R&E
(DREN/Internet2/
NASA)
US R&E
(NASA/NISN/
USDOI)
ASIA-PACIFIC
(BNP/HEPNET)
ASIA-PACIFIC
(ASCC/KAREN/
KREONET2/NUS-GP/
ODN/REANNZ/
SINET/TRANSPAC)
AUSTRALIA
(AARnet)
US R&E
(DREN/Internet2/
NISN/NLR)
US R&E
(Internet2/
NLR)
CERN
US R&E
(DREN/Internet2/
NISN)
CANADA
(CANARIE) LHCONE
CANADA
(CANARIE)
FRANCE
(OpenTransit)
RUSSIA
AND CHINA
(GLORIAD)
CERN
(USLHCNet)
ASIA-PACIFIC
(SINET)
EUROPE
(GÉANT/
NORDUNET)
EUROPE
(GÉANT)
LATIN AMERICA
(AMPATH/CLARA)
LATIN AMERICA
(CLARA/CUDI)
HOUSTON
ALBUQUERQUE
El PASO
SUNNYVALE
BOISE
SEATTLE
KANSAS CITY
NASHVILLE
WASHINGTON DC
NEW YORK
BOSTON
CHICAGO
DENVER
SACRAMENTO
ATLANTA
PNNL
SLAC
AMES PPPL
BNL
ORNL
JLAB
FNAL
ANL
LBNL
• ConnecMng
experimental
faciliMes
and
supercompuMng
centers
• On-‐Demand
Secure
Circuits
and
Advance
ReservaMon
System
• Guaranteed
between
collaboraMng
insMtuMons
by
delivering
network-‐as-‐a-‐service
• Co-‐allocaMon
of
storage
and
network
resources
(SRM:
Storage
Resource
Manager)
OSCARS
provides
yes/no
answers
to
a
reservaMon
request
for
(bandwidth,
start_Gme,
end_Gme)
End-‐to-‐end
ReservaMon:
Storage+Network
34. Reservation
Request
• Between
edge
routers
Need
to
ensure
availability
of
the
requested
bandwidth
from
source
to
desGnaGon
for
the
requested
Gme
interval
v
R={
nsource,
ndesGnaGon,
Mbandwidth,
tstart,
tend}.
v source/desMnaMon
end-‐points
v Requested
bandwidth
v start/end
Mmes
Commi3ed
reservaMons
between
tstart
and
tend
are
examined
The
shortest
path
from
source
to
desMnaMon
is
calculated
based
on
the
engineering
metric
on
each
link,
and
a
bandwidth
guaranteed
path
is
set
up
to
commit
and
eventually
complete
the
reservaMon
request
for
the
given
Mme
period
34
35. Reservation
35
v Components (Graph):
v node (router), port, link (connecting two ports)
v engineering metric (~latency)
v maximum bandwidth (capacity)
v Reservation:
v source, destination, path, time
v (time t1, t3) A -> B -> D (900Mbps)
v (time t2, t3) A -> C -> D (400Mbps)
v (time t4, t5) A -> B -> D (800Mpbs)
A
C
B
D
800Mbps
900Mbps
500Mbps
1000Mbps
300Mbps
ReservaMon
1
ReservaMon
2
ReservaMon
3
t1
t2
t3
t4
t5
36. Example
(Mme
t1,
t2)
:
A
to
D
(600Mbps)
NO
A
to
D
(500Mbps)
YES
A
C
B
D
0
Mbps
/
900Mbps
(900Mbps)
100
Mbps
/
900Mbps
(1000Mbps)
800
Mbps
/
0Mbps
(800Mbps)
500
Mbps
/
0Mbps
(500Mbps)
300
Mbps
/
0
Mbps
(300Mbps)
AcMve
reservaMon
reservaMon
1:
(Mme
t1,
t3)
A
-‐>
B
-‐>
D
(900Mbps)
reservaMon
2:
(Mme
t1,
t3)
A
-‐>
C
-‐>
D
(400Mbps)
reservaMon
3:
(Mme
t4,
t5)
A
-‐>
B
-‐>
D
(800Mpbs)
available/
reserved
(capacity)
36
37. Example
A
C
B
D
0
Mbps
/
900Mbps
(900Mbps)
100
Mbps
/
900Mbps
(1000Mbps)
400
Mbps
/
400Mbps
(800Mbps)
100
Mbps
/
400Mbps
(500Mbps)
300
Mbps
/
0
Mbps
(300Mbps)
(Mme
t1,
t3)
:
A
to
D
(500Mbps)
NO
A
to
C
(500Mbps)
No
(not
max-‐FLOW!)
AcMve
reservaMon
reservaMon
1:
(Mme
t1,
t3)
A
-‐>
B
-‐>
D
(900Mbps)
reservaMon
2:
(Mme
t1,
t3)
A
-‐>
C
-‐>
D
(400Mbps)
reservaMon
3:
(Mme
t4,
t5)
A
-‐>
B
-‐>
D
(800Mpbs)
available/
reserved
(capacity)
37
38. Alternative
Approach:
Flexible
Reservations
• IF
the
requested
bandwidth
can
not
be
guaranteed:
• Try-‐and-‐error
unMl
get
an
available
reservaMon
• Client
is
not
given
other
possible
opMons
• How
can
we
enhance
the
OSCARS
reservaMon
system?
• Be
Flexible:
• Submit
constraints
and
the
system
suggests
possible
reservaMon
opMons
saMsfying
given
requirements
38
Rs
'={
nsource
,
ndesGnaGon,
MMAXbandwidth,
DdataSize,
tEarliestStart,
tLatestEnd}
ReservaMon
engine
finds
out
the
reservaMon
R={
nsource,
ndesGnaGon,
Mbandwidth,
tstart,
tend}
for
the
earliest
compleMon
or
for
the
shortest
duraMon
where
Mbandwidth≤
MMAXbandwidth
and
tEarliestStart
≤
tstart
<
tend≤
tLatestEnd
.
39. Bandwidth
Allocation
(time-‐dependent)
Modified
Dijstra's
algorithms
(max
available
bandwidth):
• BoUleneck
constraint
(not
addiMve)
• QoS
constraint
is
addiMve
in
shortest
path,
etc)
39
The
maximum
bandwidth
available
for
allocaMon
from
a
source
node
to
a
desMnaMon
node
t1
t2
t3
t4
t5
t6
40. Analogous Example
n A vehicle travelling from city A to city B
n There are multiple cities between A and B connected with separate
highways.
n Each highway has a specific speed limit
– (maximum bandwidth)
n But we need to reduce our speed if there is high traffic load on the
road
n We know the load on each highway for every time period
– (active reservations)
n The first question is which path the vehicle should follow in order to
reach city B from city A as early as possible (earliest completion)
• Or, we can delay our journey and start later if the total travel time
would be reduced. Second question is to find the route along with the
starting time for shortest travel duration (shortest duration)
40
Advance bandwidth reservation: we have to set the speed limit before starting and
cannot change during the journey
41. Time steps
n Time steps between t1 and t13
Mme
t4
t2
t3
t1
t5
t6
t7
t8
t9
t10
t11
t12
t13
ReservaMon
1
ReservaMon
2
ReservaMon
3
Res
1
Res
1,2
Res
2
Res
3
t4
t1
t6
t7
t9
t12
t13
Mme
Mme
steps
Max (2r+1) time steps,
where r is the number of
reservations
ts1
ts2
ts3
ts4
41
42. Static Graphs
Res
1
Res
1,2
Res
2
t4
t1
t6
t7
t9
A
C
B
D
0
Mbps
100
Mbps
800
Mbps
500
Mbps
300
Mbps)
A
C
B
D
0
Mbps
100
Mbps
400
Mbps
100
Mbps
300
Mbps)
A
C
B
D
900
Mbps
1000
Mbps
400
Mbps
100
Mbps
300
Mbps)
A
C
B
D
900
Mbps
1000
Mbps
800
Mbps
500
Mbps
300
Mbps)
t4
t6
t7
G(ts3)
G(ts4)
G(ts2)
G(ts1)
42
43. Time Windows
Res
1,2
Res
2
t1
t6
t9
A
C
B
D
0
Mbps
100
Mbps
400
Mbps
100
Mbps
300
Mbps
A
C
B
D
900
Mbps
1000
Mbps
400
Mbps
100
Mbps
300
Mbps
t6
Max (s × (s + 1))/2 time windows, where s is the
number of time steps
G(tw)=G(ts3)
x
G(ts4)
tw=ts1+ts2
Bo3leneck
constraint
G(tw)=G(ts1)
x
G(ts2)
tw=ts3+ts4
43
44. Time
Window
List
(special
data
structures)
now
infinite
Time
windows
list
new
reservaMon:
reservaMon
1,
start
t1,
end
t10
now
t1
t10
infinite
Res
1
new
reservaMon:
reservaMon
2,
start
t12,
end
t20
now
t1
t10
t12
Res
1
t20
infinite
Res
2
44
Careful
sofware
design
makes
implementaMon
fast
and
efficient
45. Performance
max-bandwidth path ~ O(n^2 )
n is the number of nodes in the topology graph
In the worst-case, we may require to search all time
windows, (s × (s + 1))/2, where s is the number of
time steps.
If there are r committed reservations in the search
period, there can be a maximum of 2r + 1 different
time steps in the worst-case.
Overall, the worst-case complexity is bounded
by O(r^2 n^2 )
Note: r is relatively very small compared to the
number of nodes n 45
46. Example
Reservation 1: (time t1, t6) A -> B -> D
(900Mbps)
Reservation 2: (time t4, t7) A -> C -> D
(400Mbps)
Reservation 3: (time t9, t12) A -> B -> D
(700Mpbs)
A
C
B
D
800Mbps
900Mbps
500Mbps
1000Mbps
300Mbps
t4
t2
t3
t1
t5
t6
t7
t8
t9
t10
t11
t12
t13
ReservaMon
1
ReservaMon
2
ReservaMon
3
from A to D (earliest completion)
max bandwidth = 200Mbps, volume = 200Mbps x 4 time slots
earliest start = t1, latest finish t13
46
47. Search Order - Time Windows
Res
1
Res
1,2
Res
2
Res
3
t4
t1
t6
t7
t9
t12
t13
Mme
windows
Res
1
Res
1,
2
Res
1,
2
2
Res
1,2
Res
1,
2
Res
2
Res
1,
2
Res
1,
2
t1-‐-‐t6
t4—t6
t1-‐-‐t4
t6—t7
t4—t7
t1—t7
t7—t9
t6—t9
t4—t9
t1—t9
Max
bandwidth
from
A
to
D
1. 900Mbps
(3)
2. 100Mbps
(2)
3. 100Mbps
(5)
4. 900Mbps
(1)
5. 100Mbps
(3)
6. 100Mbps
(6)
7. 900Mpbs
(2)
8. 900Mbps
(3)
9. 100Mbps
(5)
10. 100Mbps
(8)
ReservaMon:
(
A
to
D
)
(100Mbps)
start=t1
end=t9
47
48. Search Order - Time Windows
Shortest
dura>on?
Res
1
Res
1,2
Res
2
Res
3
t4
t1
t6
t7
t9
t12
t13
Mme
windows
Res
3
Res
3
t9—t13
t12—t12
t9—t12
Max
bandwidth
from
A
to
D
1. 200Mbps
(3)
2. 900Mbps
(1)
3. 200Mbps
(4)
ReservaMon:
(A
to
D
)
(200Mbps)
start=t9
end=t13
Ø from
A
to
D,
max
bandwidth
=
200Mbps
volume
=
175Mbps
x
4
Mme
slots
earliest
start
=
t1,
latest
finish
t13
earliest
compleMon:
(
A
to
D
)
(100Mbps)
start=t1
end=t8
shortest
duraMon:
(
A
to
D
)
(200Mbps)
start=t9
end=t12.5
48
49. Source
>
Network
>
Destination
A
CB
D
800Mbps
900Mbps
500Mbps
1000Mbps
300Mbps
n2
n1
Now
we
have
mulMple
requests
49
50. With
start/end
times
•
Each
transfer
request
has
start
and
end
Mmes
• n
transfer
requests
are
given
(each
request
has
a
specific
amount
of
profit)
• ObjecMve
is
to
maximize
the
profit
• If
profit
is
same
for
each
request,
then
objecMve
is
to
maximize
the
number
of
jobs
in
a
give
Mme
period
• Unspli3able
Flow
Problem:
• An
undirected
graph,
• route
demand
from
source(s)
to
desMnaMons(s)
and
maximize/minimize
the
total
profit/cost
50
The
online
scheduling
method
here
is
inspired
from
Gale-‐Shapley
algorithm
(also
known
as
stable
marriage
problem)
51. Methodology
• Displace
other
jobs
to
open
space
for
the
new
request
•
we
can
shif
max
n
jobs?
• Never
accept
a
job
if
it
causes
other
commi3ed
jobs
to
break
their
criteria
• Planning
ahead
(gives
opportunity
for
co-‐allocaMon)
• Gives
a
polynomial
approximaMon
algorithm
• The
preference
converts
the
UFP
problem
into
Dijkstra
path
search
• UMlizes
Mme
windows/Mme
steps
for
ranking
(be3er
than
earliest
deadline
first)
• Earliest
compleMon
+
shortest
duraMon
• Minimize
concurrency
• Even
random
ranking
would
work
(relaxaMon
in
an
NP-‐hard
problem
51
52.
52
53. Recall
Time
Windows
Res
1
Res
1,2
Res
2
Res
3
t4
t1
t6
t7
t9
t12
t13
Mme
windows
Res
1
Res
1,
2
Res
1,
2
2
Res
1,2
Res
1,
2
Res
2
Res
1,
2
Res
1,
2
t1-‐-‐t6
t4—t6
t1-‐-‐t4
t6—t7
t4—t7
t1—t7
t7—t9
t6—t9
t4—t9
t1—t9
Max
bandwidth
from
A
to
D
1. 900Mbps
(3)
2. 100Mbps
(2)
3. 100Mbps
(5)
4. 900Mbps
(1)
5. 100Mbps
(3)
6. 100Mbps
(6)
7. 900Mpbs
(2)
8. 900Mbps
(3)
9. 100Mbps
(5)
10. 100Mbps
(8)
ReservaMon:
(
A
to
D
)
(100Mbps)
start=t1
end=t9
53
54. Test
54
In
real
life,
number
of
nodes
and
number
of
reservaMon
in
a
given
search
interval
are
limited
See
AINA’13
paper
for
results
+
comparison
with
different
preference
metrics
55. Autonomic
Provisioning
System
• Generate
constraints
automaMcally
(without
user
input)
• Volume
(elephant
flow?)
• True
deadline
if
applicable
• End-‐host
resource
availability
• Burst
rate
(fixed
bandwidth,
variable
bandwidth)
• Update
constraints
according
to
feedback
and
monitoring
• Minimize
operaMonal
cost
• AlternaMve
to
manual
traffic
engineering
What
is
the
incenMve
to
make
correct
reservaMons?
55
56. Data
Center
1
Data
Center
2
Data
node
B
(web
access)
Experimental
facility
A
*
(1)
Experimental
facility
A
generates
30T
of
data
every
day,
and
it
needs
to
be
stored
in
data
center
2,
before
the
next
run,
since
local
disk
space
is
limited
*
(2)
There
is
a
reservaMon
made
between
data
center
1
and
2.
It
is
used
to
replicate
data
files,
1P
total
size,
when
new
data
is
available
in
data
center
2
*
(3)
New
results
are
published
at
data
node
B,
we
expect
high
traffic
to
download
new
simulaMon
files
for
the
next
couple
of
months
Wide-‐area
SDN
56
57. Example
• Experimental
facility
periodically
transfers
data
(i.e.
every
night)
• Data
replicaMon
happens
occasionally,
and
it
will
take
a
week
to
move
1P
of
data.
If
could
get
delayed
couple
of
hours
with
no
harm
• Wide-‐area
download
traffic
will
increase
gradually,
most
of
the
traffic
will
be
during
the
day.
• We
can
dynamically
increase
preference
for
download
traffic
in
the
mornings,
give
high
priority
for
transferring
data
from
the
facility
at
night,
and
use
rest
of
the
bandwidth
for
data
replicaMon
(and
allocate
some
bandwidth
to
confirm
that
it
would
finish
within
a
week
as
usual)
57
58. Virtual
Circuit
ReservaMon
Engine
Autonomic
provisioning
system
monitoring
Reserva>on
Engine
– Select
opMmal
path/Mme/bandwidth
– maximize
the
number
of
admi3ed
requests
–
increase
overall
system
uMlizaMon
and
network
efficiency
– Dynamically
update
the
selected
rouMng
path
for
network
efficiency
– Modify
exisMng
reservaMons
dynamically
to
open
space/Mme
for
new
requests
58
59. THANK
YOU
Any
QuesMon/Comment?
Mehmet
Balman
mbalman@lbl.gov
h3p://balman.info
59