SlideShare a Scribd company logo
1 of 40
Hedera: Dynamic Flow
Scheduling for Data Center
Network
Mohammad Al-Fares, Sivasankar
Radhakrishnan, Barath Raghavan, Nelson
Huang, Amin Vahdat
- USENIX NSDI 2010 -
1
Presenter: Jason, Tsung-Cheng, HOU
Advisor: Wanjiun Liao
Dec. 22nd, 2011
Problem
• Relying on multipathing, due to…
– Limited port densities of
routers/switches
– Horizontal expansion
• Multi-rooted tree topologies
– Example: Fat-tree / Clos
2
Problem
• BW demand is essential and volatile
– Must route among multiple paths
– Avoid bottlenecks and deliver aggre. BW
• However, current multipath routing…
– Mostly: flow-hash-based ECMP
– Static and oblivious to link-utilization
– Causes long-term large-flow collisions
• Inefficiently utilizing path diversity
– Need a protocol or a scheduler
3
Collisions of elephant flows
• Collisions in two ways: Upward or Downward
D1S1 D2S2 D3S3 D4S4
Equal Cost Paths
• Many equal cost paths going up to the core
switches
• Only one path down from each core switch
• Need to find good flow-to-core mapping
DS
Goal
• Given a dynamic flow demands
– Need to find paths that maximize
network bisection BW
– No end hosts modifications
• However, local switch information is
unable to find proper allocation
– Need a central scheduler
– Must use commodity Ethernet switches
– OpenFlow
6
Architecture
• Detect Large Flows
– Flows that need bandwidth but are network-limited
• Estimate Flow Demands
– Use min-max fairness to allocate flows between SD
pairs
• Allocate Flows
– Use estimated demands to heuristically find better
placement of large flows on the EC paths
– Arrange switches and iterate again
Detect
Large Flows
Estimate
Flow Demands
Allocate Flows
Architecture
• Feedback loop
• Optimize achievable bisection BW by
assigning flow-to-core mappings
• Heuristics of flow demand estimation and
placement
• Central Scheduler
– Global knowledge of all links in the network
– Control tables of all switches (OpenFlow)
Detect
Large Flows
Estimate
Flow Demands
Allocate Flows
Elephant Detection
9
Elephant Detection
• Scheduler polls edge switches
– Flows exceeding threshold are “large”
– 10% of hosts’ link capacity (> 100Mbps)
• Small flows: Default ECMP hashing
• Hedera complements ECMP
– Default forwarding is ECMP
– Only schedules large flows contributing
to bisection BW bottlenecks
• Centralized functions: the essentials
10
Demand Estimation
11
Demand Estimation
• Current flow rate: misleading
– May be already constrained by network
• Need to find flow’s “natural” BW
demand when not limited by network
– As if only limited by NIC of S or D
• Allocate S/D capacity among flows
using max-min fairness
• Equals to BW allocation of optimal
routing, input to placement algorithm
12
Demand Estimation
• Given pairs of large flows, modify
each flow size at S/D iteratively
– S distributes unconv. BW among flows
– R limited: redistributes BW among
excessive-demand flows
– Repeat until all flows converge
• Guaranteed to converge in O(|F|)
– Linear to no. of flows
13
Demand Estimation
A
B
C
X
Y
Flow Estimate Conv. ?
AX
AY
BY
CY
Sender
Available
Unconv. BW
Flows Share
A 1 2 1/2
B 1 1 1
C 1 1 1
Senders
Demand Estimation
Recv RL?
Non-SL
Flows
Share
X No - -
Y Yes 3 1/3
Receivers
Flow Estimate Conv. ?
AX 1/2
AY 1/2
BY 1
CY 1
A
B
C
X
Y
Demand Estimation
Flow Estimate Conv. ?
AX 1/2
AY 1/3 Yes
BY 1/3 Yes
CY 1/3 Yes
Sender
Available
Unconv. BW
Flows Share
A 2/3 1 2/3
B 0 0 0
C 0 0 0
Senders
A
B
C
X
Y
Demand Estimation
Flow Estimate Conv. ?
AX 2/3 Yes
AY 1/3 Yes
BY 1/3 Yes
CY 1/3 Yes
Recv RL?
Non-SL
Flows
Share
X No - -
Y No - -
Receivers
A
B
C
X
Y
Placement Heuristics
18
Placement Heuristics
• Find a good large-flow-to-core mapping
– such that average bisection BW is maximized
• Two approaches
• Global First Fit: Greedily choose path that
has sufficient unreserved BW
– O([ports/switch]2)
• Simulated Annealing: Iteratively find a
globally better mapping of paths to flows
– O(# flows)
Global First-Fit
• New flow found, linearly search all paths from SD
• Place on first path with links can fit the flow
• Once flow ends, entries + reservations time out
?
Flow A
Flow B
Flow C
? ?
0 1 2 3
Scheduler
S D
Simulated Annealing
• Annealing: letting metal to cool down
and get better crystal structure
– Heating up to enter higher energy state
– Cooling to lower energy state with a
better structure and stopping at a temp
• Simulated Annealing:
– Search neighborhood for possible states
– Probabilistically accepting worse state
– Accepting better state, settle gradually
– Avoid local minima 21
Simulated Annealing
• State / State Space
– Possible solutions
• Energy
– Objective
• Neighborhood
– Other options
• Boltzman’s Function
– Prob. to higher state
• Control Temperature
– Current temp. affect
prob. to higher state
• Cooling Schedule
– How temp. falls
• Stopping Criterion
22
)/(1)( tEEP
Simulated Annealing
• State Space:
– All possible large-flow-to-core mappings
– However, same destinations map to same core
– Reduce state space, as long as not too many
large flows and proper threshold
• Neighborhood:
– Swap cores for two hosts within same pod,
attached to same edge / aggregate
– Avoids local minima
23
Simulated Annealing
• Energy:
– Estimated demand of flows
– Total exceeded BW capacity of links, minimize
• Temperature: remaining iterations
• Probability:
• Final state is published to switches and
used as initial state for next round
• Incremental calculation of exceeded cap.
• No recalculation of all links, only new large
flows found and neighborhood swaps 24
Evaluation
25
Implementation
• 16 hosts, k=4 fat-tree data plane
– 20 switches: 4-port NetFPGAs / OpenFlow
– Parallel 48-port non-blocking Quanta switch
– 1 scheduler, OpenFlow control protocol
– Testbed: PortLand
26
Simulator
• k=32; 8,192 hosts
– Pack-level simulators not applicable
– 1Gbps for 8k hosts, takes 2.5x1011 pkts
• Model TCP flows
– TCP’s AIMD when constrained by topology
– Poisson arrival of flows
– No pkt size variations
– No bursty traffic
– No inter-flow dynamics
27
PortLand/OpenFlow, k=4
28
Simulator
29
Reactiveness
• Demand Estimation:
– 27K hosts, 250K flows, converges < 200ms
• Simulated Annealing:
– Asymptotically dependent on # of flows + #
iter., 50K flows and 1K iter.: 11ms
– Most of final bisection BW: few hundred iter.
• Scheduler control loop:
– Polling + Est. + SA = 145ms for 27K hosts
Comments
31
Comments
• Destine to same host, via same core
– May congest at cores, but how severe?
– Large flows to/from a host: <k/2
– No proof, no evaluation
• Decrease search space and runtime
– Scalable for per-flow basis? For large k?
• No protection for mice flows, RPCs
– Only assumes work well under ECMP
– No address when route with large flows
32
Comments
• Own flow-level simulator
– Aim to saturate network
– No flow number by different size
– Traffic generation: avg. flow size and arrival
rates (Poisson) with a mean
– Only above descriptions, no specific numbers
– Too ideal or not volatile enough?
– Avg. bisection BW, but real-time graphs?
• States that per-flow VLB = per-flow ECMP
– Does not compare with other options (VL2)
– No further elaboration
33
Comments
• Shared responsibility
– Controller only deals with critical situations
– Switches perform default measures
– Improves performance and saves time
– How to strike a balance?
– Adopt to different problems?
• Default multipath routing
– States problems of per-flow VLB and ECMP
– How about per-pkt? Author’s future work
– How to improve switches’ default actions?
34
Comments
• Critical controller actions
– Considers large flows degrade overall efficiency
– What are critical situations?
– How to detect and react?
– How to improve reactiveness and adaptability?
• Amin Vahdat’s lab
– Proposes fat-tree topology
– Develops PortLand L2 virtualization
– Hedera: enhances multipath performance
– Integrate all above
35
References
• M. Al-Fares, et. al., “Hedera: Dynamic Flow Scheduling for
Data Center Network”, USENIX NSDI 2010
• Tathagata Das, “Hedera: Dynamic Flow Scheduling for Data
Center Networks”, UC Berkeley course CS 294
• M. Al-Fares, “Hedera: Dynamic Flow Scheduling for Data
Center Network”, USENIX NSDI 2010, slides
36
Supplement
37
Fault-Tolerance
• Link / Switch failure
– Use PortLand’s fault notification protocol
– Hedera routes around failed components
0 1 3
Flow A
Flow B
Flow C
2
Scheduler
Fault-Tolerance
• Scheduler failure
– Soft-state, not required for correctness
(connectivity)
– Switches fall back to ECMP
0 1 3
Flow A
Flow B
Flow C
2
Scheduler
Limitations
• Dynamic workloads,
large flow turnover
faster than control
loop
– Scheduler will be
continually chasing
the traffic matrix
• Need to include
penalty term for
unnecessary SA flow
re-assignmentsFlow Size
MatrixStability
StableUnstable
ECMP Hedera

More Related Content

What's hot

Cisco Live Milan 2015 - BGP advance
Cisco Live Milan 2015 - BGP advanceCisco Live Milan 2015 - BGP advance
Cisco Live Milan 2015 - BGP advanceBertrand Duvivier
 
1000 Ccna Questions And Answers
1000 Ccna Questions And Answers1000 Ccna Questions And Answers
1000 Ccna Questions And AnswersCCNAResources
 
MPLS L3 VPN Deployment
MPLS L3 VPN DeploymentMPLS L3 VPN Deployment
MPLS L3 VPN DeploymentAPNIC
 
Multiprotocol label switching (mpls) - Networkshop44
Multiprotocol label switching (mpls)  - Networkshop44Multiprotocol label switching (mpls)  - Networkshop44
Multiprotocol label switching (mpls) - Networkshop44Jisc
 
Pause frames an overview
Pause frames an overviewPause frames an overview
Pause frames an overviewMapYourTech
 
CCNP Switching Chapter 1
CCNP Switching Chapter 1CCNP Switching Chapter 1
CCNP Switching Chapter 1Chaing Ravuth
 
Packet flow on openstack
Packet flow on openstackPacket flow on openstack
Packet flow on openstackAchhar Kalia
 
Introduction to OpenFlow
Introduction to OpenFlowIntroduction to OpenFlow
Introduction to OpenFlowJoel W. King
 
How is Kafka so Fast?
How is Kafka so Fast?How is Kafka so Fast?
How is Kafka so Fast?Ricardo Paiva
 
Lisa 2015-gluster fs-hands-on
Lisa 2015-gluster fs-hands-onLisa 2015-gluster fs-hands-on
Lisa 2015-gluster fs-hands-onGluster.org
 
DevConf 2014 Kernel Networking Walkthrough
DevConf 2014   Kernel Networking WalkthroughDevConf 2014   Kernel Networking Walkthrough
DevConf 2014 Kernel Networking WalkthroughThomas Graf
 
OVN - Basics and deep dive
OVN - Basics and deep diveOVN - Basics and deep dive
OVN - Basics and deep diveTrinath Somanchi
 
Border Gateway Protocol (BGP)
Border Gateway Protocol (BGP)Border Gateway Protocol (BGP)
Border Gateway Protocol (BGP)Nutan Singh
 
FD.io Vector Packet Processing (VPP)
FD.io Vector Packet Processing (VPP)FD.io Vector Packet Processing (VPP)
FD.io Vector Packet Processing (VPP)Kirill Tsym
 

What's hot (20)

Cisco Live Milan 2015 - BGP advance
Cisco Live Milan 2015 - BGP advanceCisco Live Milan 2015 - BGP advance
Cisco Live Milan 2015 - BGP advance
 
1000 Ccna Questions And Answers
1000 Ccna Questions And Answers1000 Ccna Questions And Answers
1000 Ccna Questions And Answers
 
MPLS L3 VPN Deployment
MPLS L3 VPN DeploymentMPLS L3 VPN Deployment
MPLS L3 VPN Deployment
 
Multiprotocol label switching (mpls) - Networkshop44
Multiprotocol label switching (mpls)  - Networkshop44Multiprotocol label switching (mpls)  - Networkshop44
Multiprotocol label switching (mpls) - Networkshop44
 
Pause frames an overview
Pause frames an overviewPause frames an overview
Pause frames an overview
 
Mininet Basics
Mininet BasicsMininet Basics
Mininet Basics
 
CCNP Switching Chapter 1
CCNP Switching Chapter 1CCNP Switching Chapter 1
CCNP Switching Chapter 1
 
Packet flow on openstack
Packet flow on openstackPacket flow on openstack
Packet flow on openstack
 
Introduction to OpenFlow
Introduction to OpenFlowIntroduction to OpenFlow
Introduction to OpenFlow
 
How is Kafka so Fast?
How is Kafka so Fast?How is Kafka so Fast?
How is Kafka so Fast?
 
Lisa 2015-gluster fs-hands-on
Lisa 2015-gluster fs-hands-onLisa 2015-gluster fs-hands-on
Lisa 2015-gluster fs-hands-on
 
Static Routing
Static RoutingStatic Routing
Static Routing
 
EVPN Introduction
EVPN IntroductionEVPN Introduction
EVPN Introduction
 
DevConf 2014 Kernel Networking Walkthrough
DevConf 2014   Kernel Networking WalkthroughDevConf 2014   Kernel Networking Walkthrough
DevConf 2014 Kernel Networking Walkthrough
 
Broadcast storm
Broadcast stormBroadcast storm
Broadcast storm
 
OVN - Basics and deep dive
OVN - Basics and deep diveOVN - Basics and deep dive
OVN - Basics and deep dive
 
Border Gateway Protocol (BGP)
Border Gateway Protocol (BGP)Border Gateway Protocol (BGP)
Border Gateway Protocol (BGP)
 
bgp(border gateway protocol)
bgp(border gateway protocol)bgp(border gateway protocol)
bgp(border gateway protocol)
 
Layer 2 switching
Layer 2 switchingLayer 2 switching
Layer 2 switching
 
FD.io Vector Packet Processing (VPP)
FD.io Vector Packet Processing (VPP)FD.io Vector Packet Processing (VPP)
FD.io Vector Packet Processing (VPP)
 

Similar to Hedera - Dynamic Flow Scheduling for Data Center Networks, an Application of Software-Defined Networking (SDN)

Valiant Load Balancing and Traffic Oblivious Routing
Valiant Load Balancing and Traffic Oblivious RoutingValiant Load Balancing and Traffic Oblivious Routing
Valiant Load Balancing and Traffic Oblivious RoutingJason TC HOU (侯宗成)
 
A Scalable, Commodity Data Center Network Architecture
A Scalable, Commodity Data Center Network ArchitectureA Scalable, Commodity Data Center Network Architecture
A Scalable, Commodity Data Center Network ArchitectureGunawan Jusuf
 
FATTREE: A scalable Commodity Data Center Network Architecture
FATTREE: A scalable Commodity Data Center Network ArchitectureFATTREE: A scalable Commodity Data Center Network Architecture
FATTREE: A scalable Commodity Data Center Network ArchitectureAnkita Mahajan
 
DevoFlow - Scaling Flow Management for High-Performance Networks
DevoFlow - Scaling Flow Management for High-Performance NetworksDevoFlow - Scaling Flow Management for High-Performance Networks
DevoFlow - Scaling Flow Management for High-Performance NetworksJason TC HOU (侯宗成)
 
24-ad-hoc.ppt
24-ad-hoc.ppt24-ad-hoc.ppt
24-ad-hoc.pptsumadi26
 
Energy Efficient Routing Approaches in Ad-hoc Networks
                Energy Efficient Routing Approaches in Ad-hoc Networks                Energy Efficient Routing Approaches in Ad-hoc Networks
Energy Efficient Routing Approaches in Ad-hoc NetworksKishan Patel
 
Introduction to backwards learning algorithm
Introduction to backwards learning algorithmIntroduction to backwards learning algorithm
Introduction to backwards learning algorithmRoshan Karunarathna
 
layer2-network-design.ppt
layer2-network-design.pptlayer2-network-design.ppt
layer2-network-design.pptPatrick Theuri
 
layer2-network-design.ppt
layer2-network-design.pptlayer2-network-design.ppt
layer2-network-design.pptVimalMallick
 
12-adhocssasalirezaalirezalakakssaas.ppt
12-adhocssasalirezaalirezalakakssaas.ppt12-adhocssasalirezaalirezalakakssaas.ppt
12-adhocssasalirezaalirezalakakssaas.pptalirezakgm
 
Routing protocols-network-layer
Routing protocols-network-layerRouting protocols-network-layer
Routing protocols-network-layerNitesh Singh
 
AusNOG 2019: TCP and BBR
AusNOG 2019: TCP and BBRAusNOG 2019: TCP and BBR
AusNOG 2019: TCP and BBRAPNIC
 
CS553_ST7_Ch15-LANOverview (1).ppt
CS553_ST7_Ch15-LANOverview (1).pptCS553_ST7_Ch15-LANOverview (1).ppt
CS553_ST7_Ch15-LANOverview (1).pptMekiPetitSeg
 
CS553_ST7_Ch15-LANOverview.ppt
CS553_ST7_Ch15-LANOverview.pptCS553_ST7_Ch15-LANOverview.ppt
CS553_ST7_Ch15-LANOverview.pptSmitNiks
 
CS553_ST7_Ch15-LANOverview.ppt
CS553_ST7_Ch15-LANOverview.pptCS553_ST7_Ch15-LANOverview.ppt
CS553_ST7_Ch15-LANOverview.pptssuser2cc0d4
 
NZNOG 2020: Buffers, Buffer Bloat and BBR
NZNOG 2020: Buffers, Buffer Bloat and BBRNZNOG 2020: Buffers, Buffer Bloat and BBR
NZNOG 2020: Buffers, Buffer Bloat and BBRAPNIC
 
Tcp congestion control topic in high speed network
Tcp congestion control topic  in high speed networkTcp congestion control topic  in high speed network
Tcp congestion control topic in high speed networkGOKULKANNANMMECLECTC
 
RIPE 76: TCP and BBR
RIPE 76: TCP and BBRRIPE 76: TCP and BBR
RIPE 76: TCP and BBRAPNIC
 

Similar to Hedera - Dynamic Flow Scheduling for Data Center Networks, an Application of Software-Defined Networking (SDN) (20)

Data Center Network Multipathing
Data Center Network MultipathingData Center Network Multipathing
Data Center Network Multipathing
 
Valiant Load Balancing and Traffic Oblivious Routing
Valiant Load Balancing and Traffic Oblivious RoutingValiant Load Balancing and Traffic Oblivious Routing
Valiant Load Balancing and Traffic Oblivious Routing
 
A Scalable, Commodity Data Center Network Architecture
A Scalable, Commodity Data Center Network ArchitectureA Scalable, Commodity Data Center Network Architecture
A Scalable, Commodity Data Center Network Architecture
 
FATTREE: A scalable Commodity Data Center Network Architecture
FATTREE: A scalable Commodity Data Center Network ArchitectureFATTREE: A scalable Commodity Data Center Network Architecture
FATTREE: A scalable Commodity Data Center Network Architecture
 
DevoFlow - Scaling Flow Management for High-Performance Networks
DevoFlow - Scaling Flow Management for High-Performance NetworksDevoFlow - Scaling Flow Management for High-Performance Networks
DevoFlow - Scaling Flow Management for High-Performance Networks
 
24-ad-hoc.ppt
24-ad-hoc.ppt24-ad-hoc.ppt
24-ad-hoc.ppt
 
Energy Efficient Routing Approaches in Ad-hoc Networks
                Energy Efficient Routing Approaches in Ad-hoc Networks                Energy Efficient Routing Approaches in Ad-hoc Networks
Energy Efficient Routing Approaches in Ad-hoc Networks
 
Introduction to backwards learning algorithm
Introduction to backwards learning algorithmIntroduction to backwards learning algorithm
Introduction to backwards learning algorithm
 
layer2-network-design.ppt
layer2-network-design.pptlayer2-network-design.ppt
layer2-network-design.ppt
 
layer2-network-design.ppt
layer2-network-design.pptlayer2-network-design.ppt
layer2-network-design.ppt
 
12-adhocssasalirezaalirezalakakssaas.ppt
12-adhocssasalirezaalirezalakakssaas.ppt12-adhocssasalirezaalirezalakakssaas.ppt
12-adhocssasalirezaalirezalakakssaas.ppt
 
Quality of service
Quality of serviceQuality of service
Quality of service
 
Routing protocols-network-layer
Routing protocols-network-layerRouting protocols-network-layer
Routing protocols-network-layer
 
AusNOG 2019: TCP and BBR
AusNOG 2019: TCP and BBRAusNOG 2019: TCP and BBR
AusNOG 2019: TCP and BBR
 
CS553_ST7_Ch15-LANOverview (1).ppt
CS553_ST7_Ch15-LANOverview (1).pptCS553_ST7_Ch15-LANOverview (1).ppt
CS553_ST7_Ch15-LANOverview (1).ppt
 
CS553_ST7_Ch15-LANOverview.ppt
CS553_ST7_Ch15-LANOverview.pptCS553_ST7_Ch15-LANOverview.ppt
CS553_ST7_Ch15-LANOverview.ppt
 
CS553_ST7_Ch15-LANOverview.ppt
CS553_ST7_Ch15-LANOverview.pptCS553_ST7_Ch15-LANOverview.ppt
CS553_ST7_Ch15-LANOverview.ppt
 
NZNOG 2020: Buffers, Buffer Bloat and BBR
NZNOG 2020: Buffers, Buffer Bloat and BBRNZNOG 2020: Buffers, Buffer Bloat and BBR
NZNOG 2020: Buffers, Buffer Bloat and BBR
 
Tcp congestion control topic in high speed network
Tcp congestion control topic  in high speed networkTcp congestion control topic  in high speed network
Tcp congestion control topic in high speed network
 
RIPE 76: TCP and BBR
RIPE 76: TCP and BBRRIPE 76: TCP and BBR
RIPE 76: TCP and BBR
 

More from Jason TC HOU (侯宗成)

More from Jason TC HOU (侯宗成) (11)

A Data Culture in Daily Work - Examples @ KKTV
A Data Culture in Daily Work - Examples @ KKTVA Data Culture in Daily Work - Examples @ KKTV
A Data Culture in Daily Work - Examples @ KKTV
 
Triangulating Data to Drive Growth
Triangulating Data to Drive GrowthTriangulating Data to Drive Growth
Triangulating Data to Drive Growth
 
Design & Growth @ KKTV - uP!ck Sharing
Design & Growth @ KKTV - uP!ck SharingDesign & Growth @ KKTV - uP!ck Sharing
Design & Growth @ KKTV - uP!ck Sharing
 
文武雙全的產品設計 DESIGNING WITH DATA
文武雙全的產品設計 DESIGNING WITH DATA文武雙全的產品設計 DESIGNING WITH DATA
文武雙全的產品設計 DESIGNING WITH DATA
 
Growth @ KKTV
Growth @ KKTVGrowth @ KKTV
Growth @ KKTV
 
Growth 的基石 用戶行為追蹤
Growth 的基石   用戶行為追蹤Growth 的基石   用戶行為追蹤
Growth 的基石 用戶行為追蹤
 
App 的隱形殺手 - 留存率
App 的隱形殺手 - 留存率App 的隱形殺手 - 留存率
App 的隱形殺手 - 留存率
 
Software-Defined Networking , Survey of HotSDN 2012
Software-Defined Networking , Survey of HotSDN 2012Software-Defined Networking , Survey of HotSDN 2012
Software-Defined Networking , Survey of HotSDN 2012
 
Software-Defined Networking SDN - A Brief Introduction
Software-Defined Networking SDN - A Brief IntroductionSoftware-Defined Networking SDN - A Brief Introduction
Software-Defined Networking SDN - A Brief Introduction
 
Introduction to Cloud Data Center and Network Issues
Introduction to Cloud Data Center and Network IssuesIntroduction to Cloud Data Center and Network Issues
Introduction to Cloud Data Center and Network Issues
 
OpenStack Framework Introduction
OpenStack Framework IntroductionOpenStack Framework Introduction
OpenStack Framework Introduction
 

Recently uploaded

Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Orbitshub
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....
TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....
TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....rightmanforbloodline
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdfSandro Moreira
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Jeffrey Haguewood
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
API Governance and Monetization - The evolution of API governance
API Governance and Monetization -  The evolution of API governanceAPI Governance and Monetization -  The evolution of API governance
API Governance and Monetization - The evolution of API governanceWSO2
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfOrbitshub
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDropbox
 
Quantum Leap in Next-Generation Computing
Quantum Leap in Next-Generation ComputingQuantum Leap in Next-Generation Computing
Quantum Leap in Next-Generation ComputingWSO2
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontologyjohnbeverley2021
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxRemote DBA Services
 
Choreo: Empowering the Future of Enterprise Software Engineering
Choreo: Empowering the Future of Enterprise Software EngineeringChoreo: Empowering the Future of Enterprise Software Engineering
Choreo: Empowering the Future of Enterprise Software EngineeringWSO2
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Bhuvaneswari Subramani
 

Recently uploaded (20)

Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering Developers
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....
TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....
TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
API Governance and Monetization - The evolution of API governance
API Governance and Monetization -  The evolution of API governanceAPI Governance and Monetization -  The evolution of API governance
API Governance and Monetization - The evolution of API governance
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Quantum Leap in Next-Generation Computing
Quantum Leap in Next-Generation ComputingQuantum Leap in Next-Generation Computing
Quantum Leap in Next-Generation Computing
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
Choreo: Empowering the Future of Enterprise Software Engineering
Choreo: Empowering the Future of Enterprise Software EngineeringChoreo: Empowering the Future of Enterprise Software Engineering
Choreo: Empowering the Future of Enterprise Software Engineering
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
 

Hedera - Dynamic Flow Scheduling for Data Center Networks, an Application of Software-Defined Networking (SDN)

  • 1. Hedera: Dynamic Flow Scheduling for Data Center Network Mohammad Al-Fares, Sivasankar Radhakrishnan, Barath Raghavan, Nelson Huang, Amin Vahdat - USENIX NSDI 2010 - 1 Presenter: Jason, Tsung-Cheng, HOU Advisor: Wanjiun Liao Dec. 22nd, 2011
  • 2. Problem • Relying on multipathing, due to… – Limited port densities of routers/switches – Horizontal expansion • Multi-rooted tree topologies – Example: Fat-tree / Clos 2
  • 3. Problem • BW demand is essential and volatile – Must route among multiple paths – Avoid bottlenecks and deliver aggre. BW • However, current multipath routing… – Mostly: flow-hash-based ECMP – Static and oblivious to link-utilization – Causes long-term large-flow collisions • Inefficiently utilizing path diversity – Need a protocol or a scheduler 3
  • 4. Collisions of elephant flows • Collisions in two ways: Upward or Downward D1S1 D2S2 D3S3 D4S4
  • 5. Equal Cost Paths • Many equal cost paths going up to the core switches • Only one path down from each core switch • Need to find good flow-to-core mapping DS
  • 6. Goal • Given a dynamic flow demands – Need to find paths that maximize network bisection BW – No end hosts modifications • However, local switch information is unable to find proper allocation – Need a central scheduler – Must use commodity Ethernet switches – OpenFlow 6
  • 7. Architecture • Detect Large Flows – Flows that need bandwidth but are network-limited • Estimate Flow Demands – Use min-max fairness to allocate flows between SD pairs • Allocate Flows – Use estimated demands to heuristically find better placement of large flows on the EC paths – Arrange switches and iterate again Detect Large Flows Estimate Flow Demands Allocate Flows
  • 8. Architecture • Feedback loop • Optimize achievable bisection BW by assigning flow-to-core mappings • Heuristics of flow demand estimation and placement • Central Scheduler – Global knowledge of all links in the network – Control tables of all switches (OpenFlow) Detect Large Flows Estimate Flow Demands Allocate Flows
  • 10. Elephant Detection • Scheduler polls edge switches – Flows exceeding threshold are “large” – 10% of hosts’ link capacity (> 100Mbps) • Small flows: Default ECMP hashing • Hedera complements ECMP – Default forwarding is ECMP – Only schedules large flows contributing to bisection BW bottlenecks • Centralized functions: the essentials 10
  • 12. Demand Estimation • Current flow rate: misleading – May be already constrained by network • Need to find flow’s “natural” BW demand when not limited by network – As if only limited by NIC of S or D • Allocate S/D capacity among flows using max-min fairness • Equals to BW allocation of optimal routing, input to placement algorithm 12
  • 13. Demand Estimation • Given pairs of large flows, modify each flow size at S/D iteratively – S distributes unconv. BW among flows – R limited: redistributes BW among excessive-demand flows – Repeat until all flows converge • Guaranteed to converge in O(|F|) – Linear to no. of flows 13
  • 14. Demand Estimation A B C X Y Flow Estimate Conv. ? AX AY BY CY Sender Available Unconv. BW Flows Share A 1 2 1/2 B 1 1 1 C 1 1 1 Senders
  • 15. Demand Estimation Recv RL? Non-SL Flows Share X No - - Y Yes 3 1/3 Receivers Flow Estimate Conv. ? AX 1/2 AY 1/2 BY 1 CY 1 A B C X Y
  • 16. Demand Estimation Flow Estimate Conv. ? AX 1/2 AY 1/3 Yes BY 1/3 Yes CY 1/3 Yes Sender Available Unconv. BW Flows Share A 2/3 1 2/3 B 0 0 0 C 0 0 0 Senders A B C X Y
  • 17. Demand Estimation Flow Estimate Conv. ? AX 2/3 Yes AY 1/3 Yes BY 1/3 Yes CY 1/3 Yes Recv RL? Non-SL Flows Share X No - - Y No - - Receivers A B C X Y
  • 19. Placement Heuristics • Find a good large-flow-to-core mapping – such that average bisection BW is maximized • Two approaches • Global First Fit: Greedily choose path that has sufficient unreserved BW – O([ports/switch]2) • Simulated Annealing: Iteratively find a globally better mapping of paths to flows – O(# flows)
  • 20. Global First-Fit • New flow found, linearly search all paths from SD • Place on first path with links can fit the flow • Once flow ends, entries + reservations time out ? Flow A Flow B Flow C ? ? 0 1 2 3 Scheduler S D
  • 21. Simulated Annealing • Annealing: letting metal to cool down and get better crystal structure – Heating up to enter higher energy state – Cooling to lower energy state with a better structure and stopping at a temp • Simulated Annealing: – Search neighborhood for possible states – Probabilistically accepting worse state – Accepting better state, settle gradually – Avoid local minima 21
  • 22. Simulated Annealing • State / State Space – Possible solutions • Energy – Objective • Neighborhood – Other options • Boltzman’s Function – Prob. to higher state • Control Temperature – Current temp. affect prob. to higher state • Cooling Schedule – How temp. falls • Stopping Criterion 22 )/(1)( tEEP
  • 23. Simulated Annealing • State Space: – All possible large-flow-to-core mappings – However, same destinations map to same core – Reduce state space, as long as not too many large flows and proper threshold • Neighborhood: – Swap cores for two hosts within same pod, attached to same edge / aggregate – Avoids local minima 23
  • 24. Simulated Annealing • Energy: – Estimated demand of flows – Total exceeded BW capacity of links, minimize • Temperature: remaining iterations • Probability: • Final state is published to switches and used as initial state for next round • Incremental calculation of exceeded cap. • No recalculation of all links, only new large flows found and neighborhood swaps 24
  • 26. Implementation • 16 hosts, k=4 fat-tree data plane – 20 switches: 4-port NetFPGAs / OpenFlow – Parallel 48-port non-blocking Quanta switch – 1 scheduler, OpenFlow control protocol – Testbed: PortLand 26
  • 27. Simulator • k=32; 8,192 hosts – Pack-level simulators not applicable – 1Gbps for 8k hosts, takes 2.5x1011 pkts • Model TCP flows – TCP’s AIMD when constrained by topology – Poisson arrival of flows – No pkt size variations – No bursty traffic – No inter-flow dynamics 27
  • 30. Reactiveness • Demand Estimation: – 27K hosts, 250K flows, converges < 200ms • Simulated Annealing: – Asymptotically dependent on # of flows + # iter., 50K flows and 1K iter.: 11ms – Most of final bisection BW: few hundred iter. • Scheduler control loop: – Polling + Est. + SA = 145ms for 27K hosts
  • 32. Comments • Destine to same host, via same core – May congest at cores, but how severe? – Large flows to/from a host: <k/2 – No proof, no evaluation • Decrease search space and runtime – Scalable for per-flow basis? For large k? • No protection for mice flows, RPCs – Only assumes work well under ECMP – No address when route with large flows 32
  • 33. Comments • Own flow-level simulator – Aim to saturate network – No flow number by different size – Traffic generation: avg. flow size and arrival rates (Poisson) with a mean – Only above descriptions, no specific numbers – Too ideal or not volatile enough? – Avg. bisection BW, but real-time graphs? • States that per-flow VLB = per-flow ECMP – Does not compare with other options (VL2) – No further elaboration 33
  • 34. Comments • Shared responsibility – Controller only deals with critical situations – Switches perform default measures – Improves performance and saves time – How to strike a balance? – Adopt to different problems? • Default multipath routing – States problems of per-flow VLB and ECMP – How about per-pkt? Author’s future work – How to improve switches’ default actions? 34
  • 35. Comments • Critical controller actions – Considers large flows degrade overall efficiency – What are critical situations? – How to detect and react? – How to improve reactiveness and adaptability? • Amin Vahdat’s lab – Proposes fat-tree topology – Develops PortLand L2 virtualization – Hedera: enhances multipath performance – Integrate all above 35
  • 36. References • M. Al-Fares, et. al., “Hedera: Dynamic Flow Scheduling for Data Center Network”, USENIX NSDI 2010 • Tathagata Das, “Hedera: Dynamic Flow Scheduling for Data Center Networks”, UC Berkeley course CS 294 • M. Al-Fares, “Hedera: Dynamic Flow Scheduling for Data Center Network”, USENIX NSDI 2010, slides 36
  • 38. Fault-Tolerance • Link / Switch failure – Use PortLand’s fault notification protocol – Hedera routes around failed components 0 1 3 Flow A Flow B Flow C 2 Scheduler
  • 39. Fault-Tolerance • Scheduler failure – Soft-state, not required for correctness (connectivity) – Switches fall back to ECMP 0 1 3 Flow A Flow B Flow C 2 Scheduler
  • 40. Limitations • Dynamic workloads, large flow turnover faster than control loop – Scheduler will be continually chasing the traffic matrix • Need to include penalty term for unnecessary SA flow re-assignmentsFlow Size MatrixStability StableUnstable ECMP Hedera