This document summarizes the DevoFlow paper, which proposes techniques to scale flow management for high-performance networks. It finds that per-flow management in OpenFlow introduces high overheads. DevoFlow aims to balance network control, statistics collection, and switch overhead by devolving most flow control to switches while maintaining partial visibility of significant flows. Simulation results show DevoFlow can reduce flow scheduling overheads compared to per-flow control, while still achieving high performance.
Internet Research Lab at NTU, Taiwan.
SIGCOMM HotSDN 2012 is the first conference workshop focused on SDN. This presentation provides a survey of selected papers in HotSDN'12 and revisits concepts and frameworks of SDN. Example applications are also presented.
Internet Research Lab at NTU, Taiwan.
A survey of routing in data center networks and latest IEEE 802.1Qau - Congestion Notification standard in data center bridging task group.
Introduction to Cloud Computing Data Center and Network Issues to Internet Research Lab at NTU, Taiwan. Another definition of cloud computing and comparison of traditional IT warehouse and current cloud data center. (ppt slide for download.) Take a opensource data center management OS, OpenStack, as an example. Underlying network issues inside a cloud DC.
Internet Research Lab at NTU, Taiwan.
SIGCOMM HotSDN 2012 is the first conference workshop focused on SDN. This presentation provides a survey of selected papers in HotSDN'12 and revisits concepts and frameworks of SDN. Example applications are also presented.
Internet Research Lab at NTU, Taiwan.
A survey of routing in data center networks and latest IEEE 802.1Qau - Congestion Notification standard in data center bridging task group.
Introduction to Cloud Computing Data Center and Network Issues to Internet Research Lab at NTU, Taiwan. Another definition of cloud computing and comparison of traditional IT warehouse and current cloud data center. (ppt slide for download.) Take a opensource data center management OS, OpenStack, as an example. Underlying network issues inside a cloud DC.
VL2: A scalable and flexible Data Center NetworkAnkita Mahajan
This Data Center Network Architecture introduces a virtual layer 2.5 in the protocol stack of hosts and uses a directory service to achieve efficient forwarding. It uses separate location/identifier IPs
Software Defined Networking: Network VirtualizationNetCraftsmen
SDN has the potential to revolutionize the way networks are designed, sold, and operated. This presentation describes SDN, discusses what it can do, and presents use cases. It also talks about the current and potential impact of SDN on the networking industry.
What is Quality of Service?
-Basic mechanisms
-Leaky and token buckets
-Integrated Services (IntServ)
-Differentiated Services (DiffServ)
-Economics and Social factors facing QoS
-QoS Vs. Over Provisioning
Network and Service Virtualization tutorial at ONUG Spring 2015SDN Hub
Tutorial at ONUG Spring 2015 on Network and Service Virtualization. The tutorial covers three converging trends 1) Network virtualization, 2) Service virtualization, 3) overlay networking for Docker and OpenStack. The talk concludes with pointers to the hands-on portion of the tutorial that uses LorisPack, and the operational lessons learned.
HPC Best Practices: Application Performance Optimizationinside-BigData.com
Pak Lui from the HPC Advisory Council presented this deck at the Switzerland HPC Conference.
"To achieve good scalability performance on the HPC scientific applications typically involves good understanding of the workload though performing profile analysis, and comparing behaviors of using different hardware which pinpoint bottlenecks in different areas of the HPC cluster. In this session, a selection of HPC applications will be shown to demonstrate various methods of profiling and analysis to determine the bottleneck, and the effectiveness of the tuning to improve on the application performance."
Watch the video presentation: http://wp.me/p3RLHQ-f8h
Learn more: http://www.hpcadvisorycouncil.com/best_practices.php
See more talks from the Switzerland HPC Conference:
http://insidehpc.com/2016-swiss-hpc-conference/
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
At Microsoft’s annual developers conference, Microsoft Azure CTO Mark Russinovich disclosed major advances in Microsoft’s hyperscale deployment of Intel field programmable gate arrays (FPGAs). These advances have resulted in the industry’s fastest public cloud network, and new technology for acceleration of Deep Neural Networks (DNNs) that replicate “thinking” in a manner that’s conceptually similar to that of the human brain.
Watch the video: http://wp.me/p3RLHQ-gNu
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
description of the services to the networks .
how to apply quality of service
how to improve the networks
summary in personal point of view
please don't hesitate if you have further question
Hhm 3479 mq clustering and shared queues for high availabilityPete Siddall
we review clustering and shared queue technologies, their differences and synergies, as a foundation for building a highly available messaging service with resilience during both planned and unplanned outages of z Systems components.
Business Models for Dynamically Provisioned Optical NetworksTal Lavian Ph.D.
Low latency, high bandwidth services (>1Gb/s) are emerging requirements for business, medical, education, government and industry
New applications development and business models could be stimulated by affordable and easily accessible high bandwidth in both local and wide area networks
High bandwidth connections are typically full period today but full period 7x24 bandwidth is not always needed.
Technologies are now available that suggest plausible new business model options to offer time slots for high bandwidth services
Dynamic provisioning of lambda and sub-lambda time slots
Periodically scheduled (N time slots per day, per week) or ad hoc
Speech for the presentation:
In this paper the authors propose a new lock algorithm called Remote Core Locking. It’s presented in 2012 in Boston. Several approaches have been developped for optimizing critical section because many application nowadays obtain best performance with a lower number of cores than those are available on today’s multicore architecture. Performance are influenced by the lock algorithm in particular they should improve access contation and cache misses.How is possible to see in this pictures: Authors make a lot of experiments with 18 application whose the amount of time spent in critical section grows when the number of cores used increase. In the above picture instead we could see the memcached application where a get operation works in best performance with 10 cores instead a set with 2. How we can see at right a Memcache representation where RCL lock result as the best.
Authors design a solution which try to reduce the problem of bus saturation and improve cache locality for having better performance on time execution. The following blocking techinque is implemented entirely in software on a x86 architecture. The purpose it to improve performance of the execution ofthe critical section into application that have multi core architectures.
RCL replace lock acquisition with an optimized remote procedure call, to a dedicated server core. The importance of this factor is the advantage of storing shared-information in the sever core ’s cache.
We need to identify the lock that we want to transform from POSIX to RCL and core for run the server
PROFILER and Reeingeeniring Tool Coccinelle .
There’re three situation that induce to a deadlock because the server is unable to execute critical section of otherlocks
The thread could be blocked at the OS level, could spin or could be preempted at the OS level.
The runtime ensure responsiveness and liveness respectively avoiding the block of thread at OS level or inversion priority and managing at run time a pool of threads for each server : if the servicing thread is blocked/waited,replace it with another in the pool . For being sure of the existence of a free thread when is required we use an highest priority thread called ”management thread” which is activated at each timeout and check if there’s at least one progress since its last activation, otherwise it tries to modify the priority.There’s also a backup thread at lowest priority used in the cAse of block of all therads in the OS and woke up the management thread. RCL runtime implements a POSIX FIFO scheduling policy for avoid the thread preemption then to reduce the delay for minimizing the length of the FIFO queue. Using FIFO policy could induce priority inversion between threads for avoid it we use free lock algorithm.With one shared cache line .
In the RCL higher is the contation less is the delay.
Improving the Efficiency of Cloud Infrastructures with Elastic Tandem Machine...Frank Dürr
The presentation of our full paper presented at IEEE Cloud 2013.
Abstract: In this paper, we propose a concept for improving the energy efficiency and resource utilization of cloud infrastructures by combining the benefits of heterogeneous machine instances. The basic idea is to integrate low-power system on a chip (SoC) machines and high-power virtual machine instances into so-called Elastic Tandem Machine Instances (ETMI). The low-power machine serves low load and is always running to ensure the availability of the ETMI. When load rises, the ETMI scales up automatically by starting the high-power instance and handing over traffic to it. For the non-disruptive transition from low-power to high-power machines and vice versa, we present a handover mechanism based on software-defined networking technologies. Our evaluations show the applicability of low-power SoC machines to serve low load efficiently as well as the desired scalability properties of ETMIs.
VL2: A scalable and flexible Data Center NetworkAnkita Mahajan
This Data Center Network Architecture introduces a virtual layer 2.5 in the protocol stack of hosts and uses a directory service to achieve efficient forwarding. It uses separate location/identifier IPs
Software Defined Networking: Network VirtualizationNetCraftsmen
SDN has the potential to revolutionize the way networks are designed, sold, and operated. This presentation describes SDN, discusses what it can do, and presents use cases. It also talks about the current and potential impact of SDN on the networking industry.
What is Quality of Service?
-Basic mechanisms
-Leaky and token buckets
-Integrated Services (IntServ)
-Differentiated Services (DiffServ)
-Economics and Social factors facing QoS
-QoS Vs. Over Provisioning
Network and Service Virtualization tutorial at ONUG Spring 2015SDN Hub
Tutorial at ONUG Spring 2015 on Network and Service Virtualization. The tutorial covers three converging trends 1) Network virtualization, 2) Service virtualization, 3) overlay networking for Docker and OpenStack. The talk concludes with pointers to the hands-on portion of the tutorial that uses LorisPack, and the operational lessons learned.
HPC Best Practices: Application Performance Optimizationinside-BigData.com
Pak Lui from the HPC Advisory Council presented this deck at the Switzerland HPC Conference.
"To achieve good scalability performance on the HPC scientific applications typically involves good understanding of the workload though performing profile analysis, and comparing behaviors of using different hardware which pinpoint bottlenecks in different areas of the HPC cluster. In this session, a selection of HPC applications will be shown to demonstrate various methods of profiling and analysis to determine the bottleneck, and the effectiveness of the tuning to improve on the application performance."
Watch the video presentation: http://wp.me/p3RLHQ-f8h
Learn more: http://www.hpcadvisorycouncil.com/best_practices.php
See more talks from the Switzerland HPC Conference:
http://insidehpc.com/2016-swiss-hpc-conference/
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
At Microsoft’s annual developers conference, Microsoft Azure CTO Mark Russinovich disclosed major advances in Microsoft’s hyperscale deployment of Intel field programmable gate arrays (FPGAs). These advances have resulted in the industry’s fastest public cloud network, and new technology for acceleration of Deep Neural Networks (DNNs) that replicate “thinking” in a manner that’s conceptually similar to that of the human brain.
Watch the video: http://wp.me/p3RLHQ-gNu
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
description of the services to the networks .
how to apply quality of service
how to improve the networks
summary in personal point of view
please don't hesitate if you have further question
Hhm 3479 mq clustering and shared queues for high availabilityPete Siddall
we review clustering and shared queue technologies, their differences and synergies, as a foundation for building a highly available messaging service with resilience during both planned and unplanned outages of z Systems components.
Business Models for Dynamically Provisioned Optical NetworksTal Lavian Ph.D.
Low latency, high bandwidth services (>1Gb/s) are emerging requirements for business, medical, education, government and industry
New applications development and business models could be stimulated by affordable and easily accessible high bandwidth in both local and wide area networks
High bandwidth connections are typically full period today but full period 7x24 bandwidth is not always needed.
Technologies are now available that suggest plausible new business model options to offer time slots for high bandwidth services
Dynamic provisioning of lambda and sub-lambda time slots
Periodically scheduled (N time slots per day, per week) or ad hoc
Speech for the presentation:
In this paper the authors propose a new lock algorithm called Remote Core Locking. It’s presented in 2012 in Boston. Several approaches have been developped for optimizing critical section because many application nowadays obtain best performance with a lower number of cores than those are available on today’s multicore architecture. Performance are influenced by the lock algorithm in particular they should improve access contation and cache misses.How is possible to see in this pictures: Authors make a lot of experiments with 18 application whose the amount of time spent in critical section grows when the number of cores used increase. In the above picture instead we could see the memcached application where a get operation works in best performance with 10 cores instead a set with 2. How we can see at right a Memcache representation where RCL lock result as the best.
Authors design a solution which try to reduce the problem of bus saturation and improve cache locality for having better performance on time execution. The following blocking techinque is implemented entirely in software on a x86 architecture. The purpose it to improve performance of the execution ofthe critical section into application that have multi core architectures.
RCL replace lock acquisition with an optimized remote procedure call, to a dedicated server core. The importance of this factor is the advantage of storing shared-information in the sever core ’s cache.
We need to identify the lock that we want to transform from POSIX to RCL and core for run the server
PROFILER and Reeingeeniring Tool Coccinelle .
There’re three situation that induce to a deadlock because the server is unable to execute critical section of otherlocks
The thread could be blocked at the OS level, could spin or could be preempted at the OS level.
The runtime ensure responsiveness and liveness respectively avoiding the block of thread at OS level or inversion priority and managing at run time a pool of threads for each server : if the servicing thread is blocked/waited,replace it with another in the pool . For being sure of the existence of a free thread when is required we use an highest priority thread called ”management thread” which is activated at each timeout and check if there’s at least one progress since its last activation, otherwise it tries to modify the priority.There’s also a backup thread at lowest priority used in the cAse of block of all therads in the OS and woke up the management thread. RCL runtime implements a POSIX FIFO scheduling policy for avoid the thread preemption then to reduce the delay for minimizing the length of the FIFO queue. Using FIFO policy could induce priority inversion between threads for avoid it we use free lock algorithm.With one shared cache line .
In the RCL higher is the contation less is the delay.
Improving the Efficiency of Cloud Infrastructures with Elastic Tandem Machine...Frank Dürr
The presentation of our full paper presented at IEEE Cloud 2013.
Abstract: In this paper, we propose a concept for improving the energy efficiency and resource utilization of cloud infrastructures by combining the benefits of heterogeneous machine instances. The basic idea is to integrate low-power system on a chip (SoC) machines and high-power virtual machine instances into so-called Elastic Tandem Machine Instances (ETMI). The low-power machine serves low load and is always running to ensure the availability of the ETMI. When load rises, the ETMI scales up automatically by starting the high-power instance and handing over traffic to it. For the non-disruptive transition from low-power to high-power machines and vice versa, we present a handover mechanism based on software-defined networking technologies. Our evaluations show the applicability of low-power SoC machines to serve low load efficiently as well as the desired scalability properties of ETMIs.
Network-aware Data Management for Large Scale Distributed Applications, IBM R...balmanme
IBM Research – Talk – June 24, 2015
Title:
Network-aware Data Management for Large Scale Distributed Applications
Abstract:
As current technology enables faster storage devices and larger interconnect bandwidth, there is a substantial need for novel system design and middleware architecture to address increasing latency, scalability, and throughput requirements. In this talk, I will outline network-aware data management and present solutions based on my past experience in large-scale data migration between remote repositories.
I will first describe my experience in the initial evaluation of 100Gbps network as a part of the Advance Network Initiative project. We needed intense fine-tuning in network, storage, and application layers, to take advantage of the higher network capacity. End-system bottlenecks and system performance play an important role especially in many-core platforms. I will introduce a special data movement prototype, successfully tested in one of the first 100Gbps demonstrations, in which applications map memory blocks for remote data, in contrast to the send/receive semantics. This prototype was used to stream climate data over wide-area for in-memory application processing and visualization.
Within this scope, I will introduce a flexible network reservation algorithm for on-demand bandwidth guaranteed virtual circuit services. Flexible reservations find best path in a time-dependent dynamic network topology to support predictable application performance. I will then present a data-scheduling model with advance provisioning, in which data movement operations are defined with earliest start and latest completion times.
I will conclude my talk with a very brief overview of my other related projects on performance engineering, hyper-converged virtual storage, and optimization in control and data path for virtualized environments.
Network-aware Data Management for High Throughput Flows Akamai, Cambridge, ...balmanme
As current technology enables faster storage devices and larger interconnect bandwidth, there is a substantial need for novel system design and middleware architecture to address increasing latency, scalability, and throughput requirements. In this talk, I will outline network-aware data management and present solutions based on my past experience in large-scale data migration between remote repositories.
I will first describe my experience in the initial evaluation of 100Gbps network as a part of the Advance Network Initiative project. We needed intense fine-tuning in network, storage, and application layers, to take advantage of the higher network capacity. I will introduce a special data movement prototype, successfully tested in one of the first 100Gbps demonstrations, in which applications map memory blocks for remote data, in contrast to the send/receive semantics.
Within this scope, I will introduce a flexible network reservation algorithm for on-demand bandwidth guaranteed virtual circuit services. Flexible reservations find best path in a time-dependent dynamic network topology to support predictable application performance. I will then present a data-scheduling model with advance provisioning, in which data movement operations are defined with earliest start and latest completion times.
I will conclude my talk with a very brief overview of my other related projects on performance engineering, hyper-converged virtual storage, and optimization in control and data path for virtualized environments.
Sept 28, 2015
Akamai, Cambridge, MA
Ted Dunning – Very High Bandwidth Time Series Database Implementation - NoSQL...NoSQLmatters
Ted Dunning – Very High Bandwidth Time Series Database Implementation
This talk will describe our work in creating time series databases with very high ingest rates (over 100 million points / second) on very small clusters. Starting with openTSDB and the off-the-shelf version of MapR-DB, we were able to accelerate ingest by >1000x. I will describe our techniques in detail and talk about the architectural changes required. We are also working to allow access to openTSDB data using SQL via Apache Drill. In addition, I will talk about how this work has implications regarding the much fabled Internet of Things. And tell some stories about the origins of open source big data in the 19th century at sea.
This tutorial gives out an brief and interesting introduction to modern stream computing technologies. The participants can learn the essential concepts and methodologies for designing and building a advanced stream processing system. The tutorial unveils the key fundamentals behind various kinds of design choices. Some forecast of technology developments in this domain is also introduced at the last section of this tutorial.
You’ve worked hard to define, develop and execute a performance test on a new application to determine its behavior under load. You have barrels full of numbers. What’s next? The answer is definitely not to generate and send a canned report from your testing tool. Results interpretation and reporting is where a performance tester earns their stripes.
In the first half of this workshop we’ll start by looking at some results from actual projects and together puzzle out the essential message in each. This will be a highly interactive session where we will display a graph, provide a little context, and ask “what do you see here?” We will form hypotheses, draw tentative conclusions, determine what further information we need to confirm them, and identify key target graphs that give us the best insight on system performance and bottlenecks.
In the second half of this session, we’ll try to codify the analytic steps we went through in the first session, and consider a CAVIAR approach for collecting and evaluating test results: Collecting, Aggregating, Visualizing, Interpreting, Analyzing, And Reporting.
Tungsten Connector / Proxy is truly the secret sauce for the Tungsten Clustering solution. Watch this webinar to learn how the Tungsten Connector enables zero-downtime MySQL maintenance via the manual switch operation, and gain an understanding of the various configuration options for doing local reads in remote composite clusters.
AGENDA
- Review the cluster architecture
- Understand the role of the Connector
- Describe Connector deployment best practices (app, dedicated with lb, db with lb)
- Explore zero-downtime MySQL maintenance using the manual role switch procedure
- Learn about Connector routing patterns inside a composite cluster
- Illustrate a manual site switch
- Explain read affinity and the vast performance improvement of local reads
- Examine Connector multi-cluster support
Inter-controller Traffic in ONOS Clusters for SDN Networks Paolo Giaccone
In distributed SDN architectures, the network is controlled by a cluster of multiple controllers. This distributed ap- proach permits to meet the scalability and reliability requirements of large operational networks. Despite that, a logical centralized view of the network state should be guaranteed, enabling the simple development of network applications. Achieving a consis- tent network state requires a consensus protocol, which generates control traffic among the controllers whose timely delivery is crucial for network performance.
We focus on the state-of-art ONOS controller, designed to scale to large networks, based on a cluster of self-coordinating controllers, and concentrate on the inter-controller control traffic. Based on real traffic measurements, we develop a model to quan- tify the traffic exchanged among the controllers, which depends on the topology of the controlled network. This model is useful to design and dimension the control network interconnecting the controllers.
How the Internet of Things is Turning the Internet Upside DownTed Dunning
This is a wide-ranging talk that goes into how the internet is architected, how that architecture is changing as a result of internet of things, how the internet of things worked in the 19th century big data, open-source community and how to build time-series databases to make this all possible.
Really.
Similar to DevoFlow - Scaling Flow Management for High-Performance Networks (20)
Is data culture just a fantasy for giant corporations? How does a data culture impact a startup's daily work? How does a team collaborate around a data culture?
Examples at KKTV, leveraged the tool Amplitude
How to use both big and thick data, quantitative and qualitative user studies, to drive product development, design, and growth?
Case studies at KKTV and examples of leveraging Amplitude Analytics.
成長駭客與成長團隊,如何和產品設計合作,打造持續變化且追求成長的產品。
How do KKTV's Growth Team and Growth Hackers co-work with Product Designers, for building a fast-growing and ever-changing product.
AB testing and product design based on behavior data extracted from users, both qualitative and quantitative, and most important of all, how to leverage both data to reinforce each other, sharpening two swords.
Software-Defined Networking SDN - A Brief IntroductionJason TC HOU (侯宗成)
Internet Research Lab at NTU, Taiwan.
Software-Defined Networking overview and framework introduction. (ppt slide for download.) Comparing server virtualization and network virtualization, take Onix controller as an example. A quick view to LightRadio from Alcetel-Lucent.
Presentation of OpenStack survey to Internet Research Lab at National Taiwan University, Taiwan. OpenStack framework and architecture overview. (ppt slide for download.) Materials collected from various resources, not originally produced by the author.
Briefly explained Nova, Swift, Glance, Keystone, and Quantum.
DevoFlow - Scaling Flow Management for High-Performance Networks
1. DevoFlow: Scaling Flow
Management for High-Performance
Networks
Andrew R. Curtis (University of Waterloo); Jeffrey C.
Mogul, Jean Tourrilhes, Praveen Yalagandula, Puneet
Sharma, Sujata Banerjee (HP Labs), SIGCOMM 2011
Presenter: Jason, Tsung-Cheng, HOU
Advisor: Wanjiun Liao
Mar. 22nd, 2012 1
2. Motivation
• SDN / OpenFlow can enable per-flow
management… However…
• What are the costs and limitations?
• Network-wide logical graph
= always collecting all flows’ stat.s?
• Any more problems beyond controller’s
scalability?
• Enhancing performance / scalability of
controllers solves all problems?
2
3. DevoFlow Contributions
• Characterize overheads of implementing
OpenFlow on switches
• Evaluate flow mgmt capability within data
center network environment
• Propose DevoFlow to enable scalable flow
mgmt by balancing
– Network control
– Statistics collection
– Overheads
– Switch functions and controller loads
3
4. Agenda
• OF Benefits, Bottlenecks, and Dilemmas
• Evaluation of Overheads
• DevoFlow
• Simulation Results
4
5. Benefits
• Flexible policies w/o switch-by-sw config.
• Network graph and visibility, stat.s collection
• Enable traffic engineering and network mgmt
• OpenFlow switches are relatively simple
• Accelerate innovation:
– VL2, PortLand: new architecture, virtualized addr
– Hedera: flow scheduling
– ElsticTree: energy-proportional networking
• However, no further est. of overheads
5
6. Bottlenecks
• Root: Excessively couples..
– central control and complete visibility
• Controller bottleneck: scale by dist. sys.
• Switch bottleneck:
– Data- to control-plane: limited BW
– Enormous flow tables, too many entries
– Control and stat.s pkts compete for BW
– Introduce extra delays and latencies
• Switch bottleneck was not well studied
6
7. Dilemma
• Control dilemma:
– Role of controller: visibility and mgmt capability
however, per-flow setup too costly
– Flow-match wildcard, hash-based:
much less load, but no effective control
• Statistics-gathering dilemma:
– Pull-based mechanism: counters of all flows
full visibility but demand high BW
– Wildcard counter aggregation: much less entries
but lose trace of elephant flows
• Aim to strike in between 7
8. Main Concept of DevoFlow
• Devolving most flow controls to switches
• Maintain partial visibility
• Keep trace of significant flows
• Default v.s. special actions:
– Security-sensitive flows: categorically inspect
– Normal flows: may evolve or cover other flows
become security-sensitive or significant
– Significant flows: special attention
• Collect stat.s by sampling, triggering, and
approximating
8
9. Design Principles of DevoFlow
• Try to stay in data-plane, by default
• Provide enough visibility:
– Esp. for significant flows & sec-sensitive flows
– Otherwise, aggregate or approximate stat.s
• Maintain simplicity of switches
9
10. Agenda
• OF Benefits, Bottlenecks, and Dilemmas
• Evaluation of Overheads
• DevoFlow
• Simulation Results
10
11. Overheads: Control PKTs
A N-switch path
For a path with N switches: N+1 control pkts
• First flow pkt to controller
• N control messages to N switches
Average length of a flow in 1997: 20 pkts
In clos / fat-tree DCN topo: 5 switches
6 control pkts per flow
The smaller the flow, the higher cost of BW
11
12. Overheads: Flow Setup
• Switch w/ finite BW between data / control
plane, i.e. overheads between ASIC and CPU
• Setup capability: 275~300 flows/sec
• Similar with [30]
• In data center: mean interarrival 30 ms
• Rack w/ 40 servers 1300 flows/sec
• In whole data center
[43] R. Sherwood, G. Gibb, K.-K. Yap, G. Appenzeller, M. Casado,
N. McKeown, and G. Parulkar. Can the production network be
the testbed? In OSDI , 2010. 12
17. Overheads: Gathering Stat.s
• [30] most longest-lived flows: only a few sec
• Counters: (pkts, bytes, duration)
• Push-based: to controller when flow ends
• Pull-based: fetch actively by controller
• 88F bytes for F flows
• In 5406zl switch:
Entries:1.5K wildcard match/13K exact match
total 1.3 MB, 2 fetches/sec, 17 Mbps
Not fast enough! Consumes a lot of BW!
[30] S. Kandula, S. Sengupta, A. Greenberg, and P. Patel. The
Nature of Datacenter Trac: Measurements & Analysis. In
Proc. IMC , 2009. 17
18. Overheads: Gathering Stat.s
2.5 sec to pull 13K entries
1 sec to pull 5,600 entries
0.5 sec to pull 3,200 entries
18
19. Overheads: Gathering Stat.s
• Per-flow setup generates too many entries
• More the controller fetch longer
• Longer to fetch longer the control loop
• In Hedera: control loop 5 secs
BUT workload too ideal, Pareto distribution
• Workload in VL2, 5 sec only improves 1~5%
over ECMP
• [41], must be less than 0.5 sec to be better
[41] C. Raiciu, C. Pluntke, S. Barre, A. Greenhalgh, D. Wischik,
and M. Handley. Data center networking with multipath TCP.
In HotNets , 2010. 19
20. Overheads: Competition
• Flow setups and stat-pulling compete for BW
• Must need timely stat.s for scheduling
• Switch flow entries
– OpenFlow: TCAMs, wildcard, consumes lots of
power & space
– Rules: 10 header fields, 288 bits each
– Only 60 bits for trad. Ethernet
• Per-flow entry v.s. per-host entry
20
22. Agenda
• OF Benefits, Bottlenecks, and Dilemmas
• Evaluation of Overheads
• DevoFlow
• Simulation Results
22
23. Mechanisms
• Control
– Rule cloning
– Local actions
• Statistics-gathering
– Sampling
– Triggers and reports
– Approximate counters
• Flow scheduler: like Hedera
• Multipath routing: based on probability dist.
enable oblivious routing
23
24. Rule Cloning
• ASIC clones a wildcard rule as an exact
match rule for new microflows
• Timeout or output port by probability
24
25. Rule Cloning
• ASIC clones a wildcard rule as an exact
match rule for new microflows
• Timeout or output port by probability
25
26. Rule Cloning
• ASIC clones a wildcard rule as an exact
match rule for new microflows
• Timeout or output port by probability
26
27. Local Actions
• Rapid re-routing: fallback paths predefined
Recover almost immediately
• Multipath support: based on probability dist.
Adjusted by link capacity or loads
27
28. Statistics-Gathering
• Sampling
– Pkts headers send to controller with1/1000 prob.
• Triggers and reports
– Set a threshold per rule
– When exceeds, enable flow setup at controller
• Approximate counters
– Maintain list of top-k largest flows
28
29. Implementation
• Not yet on hardware
• Engineers support this by using existing
functional blocks for most mechanisms
• Provide some basic tools for SDN
• However, scaling remains a problem
What threshold? How to sample? Rate?
• Default multipath on switches
• Controller samples or sets triggers to detect
elephants, schedules by bin-packing algo.
29
30. Simulation
• How much flow scheduling overheads can be
reduced, while achieving high performance?
• Custom built flow-level simulator, based on
5406zl experiments
• Workloads generated:
– Reverse-engineered [30], by MSR, 1500-server
– MapReduce shuffle stage, 128MB to each other
– Combine these two
[30] S. Kandula, S. Sengupta, A. Greenberg, and P. Patel. The
Nature of Datacenter Trac: Measurements & Analysis. In
Proc. IMC , 2009. 30
42. Conclusion
• Per-flow control imposes too many overheads
• Balance between
– Overheads and network visibility
– Effective traffic engineering / network mgmt
Could lead to various researches
• Switches w/ limited resources
– Flow entries / control-plane BW
– Hardware capability / power consumption
42