SlideShare a Scribd company logo
Overlapping Ring
Monitoring Algorithm
in TIPC
Jon Maloy, Ericsson Canada Inc. Montreal
April 7th 2017
When a cluster node becomes unresponsive due to
crash, reboot or lost connectivity we want to:
 Have all affected connections on the remaining nodes aborted
 Inform other users who have subscribed for cluster connectivity
events
 Within a well-defined short interval from the occurrence of the
event
PURPOSE
1) Crank up the connection keepalive timer
 Network and CPU load quickly gets out of hand when there are thousands of connections
 Does not provide a neighbor monitoring service that can be used by others
2) Dedicated full-mesh framework of per-node daemons with
frequently probed connections
 Even here monitoring traffic becomes overwhelming when cluster size > 100 nodes
 Does not automatically abort any other connections
COMMON SOLUTIONS
 Full-mesh framework of frequently probed node-to-node “links”
 At kernel level
 Provides generic neighbor monitoring service
 Each link endpoint keeps track of all connections to peer node
 Issues “ABORT” message to its local socket endpoints when connectivity to peer node is lost
 Even this solution causes excessive traffic beyond ~100 nodes
 CPU load grows with ~N
 Network load grows with ~N*(N-1)
TIPC SOLUTION: HIERARCHY + FULL MESH
 Each node monitors its two nearest neighbors by heatbeats
 Low monitoring network overhead, - increases by ~2*N
 Node loss can also be detected through loss of an iterating token
 Both solutions offered by Corosync
 Hard to handle accidental network partitioning
 How do we detect loss of nodes not adjacent to fracture point in opposite partition?
 Consensus on ring topology required
OTHER SOLUTION: RING
 Each node periodically transmits its known network view to a
randomly selected set of known neighbors
 Each node knows and monitors only a subset of all nodes
 Scales extremely well
 Used by BitTorrent client Tribler
 Non-deterministic delay until all cluster nodes are informed
 Potentially very long because of the periodic and random nature of event propagation
 Unpredictable number of generations to reach last node
 Extra network overhead because of duplicate information spreading
OTHER SOLUTION: GOSSIP PROTOCOL
THE CHALLENGE
Finding an algorithm which:
 Has the scalability of Gossip, but with
 A deterministic set of peer nodes to monitor and update from each node
 A predictable number of propagation generations before all nodes are reached
 Predictable, well-defined and short event propagation delay
 Has the light-weight properties of ring monitoring, but
 Is able to handle accidental network partitioning
 Has the full-mesh link connectivity of TIPC, but
 Does not require full-mesh active monitoring
THE ANSWER:
OVERLAPPING RING MONITORING
 Sort all cluster nodes into a circular list
 All nodes use same algorithm and criteria
 Select next [√N] - 1 downstream nodes in the
list as “local domain” to be actively monitored
 CPU load increases by ~√N
 Distribute a record describing the local domain
to all other nodes in the cluster
 Select and monitor a set of “head” nodes outside
the local domain so that no node is more than
two active monitoring hops away
 There will be [√N] - 1 such nodes
 Guarantees failure discovery even at
accidental network partitioning
 Each node now monitors 2 x (√N – 1) neighbors
• 6 neighbors in a 16 node cluster
• 56 neighbors in an 800 node cluster
 All nodes use this algorithm
 In total 2 x (√N - 1) x N actively monitored links
• 96 links in a 16 node cluster
• 44,800 links in an 800 node cluster
+ x N =
(√N – 1) Local Domain
Destinations
(√N – 1) Remote
“Head” Destinations
2 x N x (√N – 1) Actively
Monitored Links
LOSS OF LOCAL DOMAIN NODE
State change of local
domain node detected
1
 A domain record is sent to all other nodes in cluster when any state change
(discovery, loss, re-establish) is detected in a local domain node
 The record keeps a generation id, so the receiver can know if it really
contains a change before it starts parsing and applying it
 It is piggy-backed on regular unicast link state/probe messages, which must
always be sent out after a domain state change
 May be sent several times until the receiver acknowledges reception of the
current generation
 Because probing is driven by a background timer, it may take up to 375 ms
(configurable) until all nodes are updated
1
Domain record distributed to
all other nodes in cluster
LOSS OF ACTIVELY MONITORED HEAD NODE
Node failure detected Brief confirmation probing of
lost node’s domain members
After recalculation
 The two-hop criteria plus confirmation probing eliminates the
network partitioning problem
 If we really have a partition worst-case failure detection time will be
 Tfailmax = 2 x active failure detection time
 Active failure detection time is configurable
 50 ms – 10 s
 Default 1.5 s in TIPC/Linux 4.7
Actively monitored nodes outside local domain
LOSS OF INDIRECTLY MONITORED NODE
Actively monitoring neighbors
discover failure
Actively monitoring neighbors
report failure
 Max one event propagation hop
 Near uniform failure detection time across the whole cluster
 Tfailmax = active failure detection time + (1 x event propagation hop time)
Actively monitored nodes outside local domain
DIFFERING NETWORK VIEWS
1
A node has discovered a peer that
nobody else is monitoring
 Actively monitor that node
 Add it to its circular list according to algorithm (as local domain
member or “head”)
 Handle its domain members according to algorithm (“applied”
or “non-applied”)
 Continue calculating the monitoring view from the next peer
Actively monitored nodes outside local domain
1
A node is unable to discover a peer
that others are monitoring
 Don’t add the peer to the circular list
 Ignore it during the calculation of the monitoring view
 Keep it as “non-applied” in the copies of received domain records
 Apply it to the monitoring view if it is discovered at a later moment
Transiently, this happens all the time, and must be considered a normal situation
STATUS LISTING OF 16 NODE CLUSTER
5
13
9
1
STATUS LISTING OF 600 NODE CLUSTER
THE END

More Related Content

What's hot

The Zen of High Performance Messaging with NATS
The Zen of High Performance Messaging with NATS The Zen of High Performance Messaging with NATS
The Zen of High Performance Messaging with NATS
NATS
 
Learning Python. Level 0
Learning Python. Level 0Learning Python. Level 0
Learning Python. Level 0
Datio Big Data
 
MySQL operator for_kubernetes
MySQL operator for_kubernetesMySQL operator for_kubernetes
MySQL operator for_kubernetes
rockplace
 
Coding style of Linux Kernel
Coding style of Linux KernelCoding style of Linux Kernel
Coding style of Linux KernelPeter Chang
 
An introduction to SSH
An introduction to SSHAn introduction to SSH
An introduction to SSH
nussbauml
 
Wrapper classes
Wrapper classes Wrapper classes
Xml namespace
Xml namespaceXml namespace
Xml namespace
GayathriS578276
 
A 30-minute Introduction to NETCONF and YANG
A 30-minute Introduction to NETCONF and YANGA 30-minute Introduction to NETCONF and YANG
A 30-minute Introduction to NETCONF and YANG
Tail-f Systems
 
Ip
IpIp
Basic er diagram
Basic er diagramBasic er diagram
Basic er diagram
Hannan Riad
 
Redis and Ohm
Redis and OhmRedis and Ohm
Redis and Ohm
awksedgreep
 
지금 당장 (유사) DDD 시작하기
지금 당장 (유사) DDD 시작하기지금 당장 (유사) DDD 시작하기
지금 당장 (유사) DDD 시작하기
대원 서
 
exception handling
exception handlingexception handling
exception handling
rajshreemuthiah
 
Xml parsers
Xml parsersXml parsers
Xml parsers
Manav Prasad
 
Waris l2vpn-tutorial
Waris l2vpn-tutorialWaris l2vpn-tutorial
Waris l2vpn-tutorialrakiva29
 
M|18 Architectural Overview: MariaDB MaxScale
M|18 Architectural Overview: MariaDB MaxScaleM|18 Architectural Overview: MariaDB MaxScale
M|18 Architectural Overview: MariaDB MaxScale
MariaDB plc
 

What's hot (20)

OSPF
OSPFOSPF
OSPF
 
The Zen of High Performance Messaging with NATS
The Zen of High Performance Messaging with NATS The Zen of High Performance Messaging with NATS
The Zen of High Performance Messaging with NATS
 
Learning Python. Level 0
Learning Python. Level 0Learning Python. Level 0
Learning Python. Level 0
 
MySQL operator for_kubernetes
MySQL operator for_kubernetesMySQL operator for_kubernetes
MySQL operator for_kubernetes
 
Coding style of Linux Kernel
Coding style of Linux KernelCoding style of Linux Kernel
Coding style of Linux Kernel
 
An introduction to SSH
An introduction to SSHAn introduction to SSH
An introduction to SSH
 
Wrapper classes
Wrapper classes Wrapper classes
Wrapper classes
 
Acl estandar
Acl estandarAcl estandar
Acl estandar
 
Xml namespace
Xml namespaceXml namespace
Xml namespace
 
A 30-minute Introduction to NETCONF and YANG
A 30-minute Introduction to NETCONF and YANGA 30-minute Introduction to NETCONF and YANG
A 30-minute Introduction to NETCONF and YANG
 
Ip multicast
Ip multicastIp multicast
Ip multicast
 
Ip
IpIp
Ip
 
Zonas dmz y_puertos
Zonas dmz y_puertosZonas dmz y_puertos
Zonas dmz y_puertos
 
Basic er diagram
Basic er diagramBasic er diagram
Basic er diagram
 
Redis and Ohm
Redis and OhmRedis and Ohm
Redis and Ohm
 
지금 당장 (유사) DDD 시작하기
지금 당장 (유사) DDD 시작하기지금 당장 (유사) DDD 시작하기
지금 당장 (유사) DDD 시작하기
 
exception handling
exception handlingexception handling
exception handling
 
Xml parsers
Xml parsersXml parsers
Xml parsers
 
Waris l2vpn-tutorial
Waris l2vpn-tutorialWaris l2vpn-tutorial
Waris l2vpn-tutorial
 
M|18 Architectural Overview: MariaDB MaxScale
M|18 Architectural Overview: MariaDB MaxScaleM|18 Architectural Overview: MariaDB MaxScale
M|18 Architectural Overview: MariaDB MaxScale
 

Similar to Overlapping Ping Monitoring

ieee 802.4
ieee 802.4ieee 802.4
ieee 802.4
LautaroRondan
 
Rain Technology
Rain TechnologyRain Technology
Rain Technology
kamini chaudhary
 
Routing in Wireless Sensor Network
Routing in Wireless Sensor NetworkRouting in Wireless Sensor Network
Routing in Wireless Sensor Network
Aarthi Raghavendra
 
ieeeeeee802.ppt
ieeeeeee802.pptieeeeeee802.ppt
ieeeeeee802.ppt
LautaroRondan
 
Intro to DTN and routing classification
Intro to DTN and routing classificationIntro to DTN and routing classification
Intro to DTN and routing classification
Shivi Shukla
 
Dist deadlock sureka
Dist deadlock surekaDist deadlock sureka
Dist deadlock sureka
Arun Kannan
 
Cassandra basic
Cassandra basicCassandra basic
Cassandra basic
zqhxuyuan
 
Multiple Access Methods
Multiple Access MethodsMultiple Access Methods
Multiple Access Methods
Prateek Soni
 
Frame counting: achieve accurate and real-time link estimation for low power ...
Frame counting: achieve accurate and real-time link estimation for low power ...Frame counting: achieve accurate and real-time link estimation for low power ...
Frame counting: achieve accurate and real-time link estimation for low power ...
Daibo Liu
 
rain technology
rain technology rain technology
rain technology
narayan dudhe
 
Security Support in In-Network Processing & analysis of key management in WSN
Security Support in In-Network  Processing & analysis of key management in  WSNSecurity Support in In-Network  Processing & analysis of key management in  WSN
Security Support in In-Network Processing & analysis of key management in WSN
vik001ind
 
Protocols for wireless sensor networks
Protocols for wireless sensor networks Protocols for wireless sensor networks
Protocols for wireless sensor networks
DEBABRATASINGH3
 
datalink.ppt
datalink.pptdatalink.ppt
datalink.ppt
Jayaprasanna4
 
Node Legitimacy Based False Data Filtering Scheme in Wireless Sensor Networks
Node Legitimacy Based False Data Filtering Scheme in Wireless Sensor NetworksNode Legitimacy Based False Data Filtering Scheme in Wireless Sensor Networks
Node Legitimacy Based False Data Filtering Scheme in Wireless Sensor Networks
Eswar Publications
 
Fault tolerance in wsn
Fault tolerance in wsnFault tolerance in wsn
Fault tolerance in wsnElham Hormozi
 
Floodlight OpenFlow DDoS
Floodlight OpenFlow DDoSFloodlight OpenFlow DDoS
Floodlight OpenFlow DDoS
Yoav Francis
 
Introduction to deep learning
Introduction to deep learningIntroduction to deep learning
Introduction to deep learning
Junaid Bhat
 

Similar to Overlapping Ping Monitoring (20)

ieee 802.4
ieee 802.4ieee 802.4
ieee 802.4
 
Rain Technology
Rain TechnologyRain Technology
Rain Technology
 
Routing in Wireless Sensor Network
Routing in Wireless Sensor NetworkRouting in Wireless Sensor Network
Routing in Wireless Sensor Network
 
ieeeeeee802.ppt
ieeeeeee802.pptieeeeeee802.ppt
ieeeeeee802.ppt
 
Intro to DTN and routing classification
Intro to DTN and routing classificationIntro to DTN and routing classification
Intro to DTN and routing classification
 
Dist deadlock sureka
Dist deadlock surekaDist deadlock sureka
Dist deadlock sureka
 
Cassandra basic
Cassandra basicCassandra basic
Cassandra basic
 
Multiple Access Methods
Multiple Access MethodsMultiple Access Methods
Multiple Access Methods
 
Frame counting: achieve accurate and real-time link estimation for low power ...
Frame counting: achieve accurate and real-time link estimation for low power ...Frame counting: achieve accurate and real-time link estimation for low power ...
Frame counting: achieve accurate and real-time link estimation for low power ...
 
rain technology
rain technology rain technology
rain technology
 
Security Support in In-Network Processing & analysis of key management in WSN
Security Support in In-Network  Processing & analysis of key management in  WSNSecurity Support in In-Network  Processing & analysis of key management in  WSN
Security Support in In-Network Processing & analysis of key management in WSN
 
Protocols for wireless sensor networks
Protocols for wireless sensor networks Protocols for wireless sensor networks
Protocols for wireless sensor networks
 
datalink.ppt
datalink.pptdatalink.ppt
datalink.ppt
 
Node Legitimacy Based False Data Filtering Scheme in Wireless Sensor Networks
Node Legitimacy Based False Data Filtering Scheme in Wireless Sensor NetworksNode Legitimacy Based False Data Filtering Scheme in Wireless Sensor Networks
Node Legitimacy Based False Data Filtering Scheme in Wireless Sensor Networks
 
Fault tolerance in wsn
Fault tolerance in wsnFault tolerance in wsn
Fault tolerance in wsn
 
Floodlight OpenFlow DDoS
Floodlight OpenFlow DDoSFloodlight OpenFlow DDoS
Floodlight OpenFlow DDoS
 
Introduction to deep learning
Introduction to deep learningIntroduction to deep learning
Introduction to deep learning
 
Topic5
Topic5Topic5
Topic5
 
Osi l ayers
Osi l ayersOsi l ayers
Osi l ayers
 
Can ppt
Can pptCan ppt
Can ppt
 

Recently uploaded

Bridging the Digital Gap Brad Spiegel Macon, GA Initiative.pptx
Bridging the Digital Gap Brad Spiegel Macon, GA Initiative.pptxBridging the Digital Gap Brad Spiegel Macon, GA Initiative.pptx
Bridging the Digital Gap Brad Spiegel Macon, GA Initiative.pptx
Brad Spiegel Macon GA
 
test test test test testtest test testtest test testtest test testtest test ...
test test  test test testtest test testtest test testtest test testtest test ...test test  test test testtest test testtest test testtest test testtest test ...
test test test test testtest test testtest test testtest test testtest test ...
Arif0071
 
原版仿制(uob毕业证书)英国伯明翰大学毕业证本科学历证书原版一模一样
原版仿制(uob毕业证书)英国伯明翰大学毕业证本科学历证书原版一模一样原版仿制(uob毕业证书)英国伯明翰大学毕业证本科学历证书原版一模一样
原版仿制(uob毕业证书)英国伯明翰大学毕业证本科学历证书原版一模一样
3ipehhoa
 
1.Wireless Communication System_Wireless communication is a broad term that i...
1.Wireless Communication System_Wireless communication is a broad term that i...1.Wireless Communication System_Wireless communication is a broad term that i...
1.Wireless Communication System_Wireless communication is a broad term that i...
JeyaPerumal1
 
How to Use Contact Form 7 Like a Pro.pptx
How to Use Contact Form 7 Like a Pro.pptxHow to Use Contact Form 7 Like a Pro.pptx
How to Use Contact Form 7 Like a Pro.pptx
Gal Baras
 
This 7-second Brain Wave Ritual Attracts Money To You.!
This 7-second Brain Wave Ritual Attracts Money To You.!This 7-second Brain Wave Ritual Attracts Money To You.!
This 7-second Brain Wave Ritual Attracts Money To You.!
nirahealhty
 
The+Prospects+of+E-Commerce+in+China.pptx
The+Prospects+of+E-Commerce+in+China.pptxThe+Prospects+of+E-Commerce+in+China.pptx
The+Prospects+of+E-Commerce+in+China.pptx
laozhuseo02
 
APNIC Foundation, presented by Ellisha Heppner at the PNG DNS Forum 2024
APNIC Foundation, presented by Ellisha Heppner at the PNG DNS Forum 2024APNIC Foundation, presented by Ellisha Heppner at the PNG DNS Forum 2024
APNIC Foundation, presented by Ellisha Heppner at the PNG DNS Forum 2024
APNIC
 
BASIC C++ lecture NOTE C++ lecture 3.pptx
BASIC C++ lecture NOTE C++ lecture 3.pptxBASIC C++ lecture NOTE C++ lecture 3.pptx
BASIC C++ lecture NOTE C++ lecture 3.pptx
natyesu
 
guildmasters guide to ravnica Dungeons & Dragons 5...
guildmasters guide to ravnica Dungeons & Dragons 5...guildmasters guide to ravnica Dungeons & Dragons 5...
guildmasters guide to ravnica Dungeons & Dragons 5...
Rogerio Filho
 
一比一原版(SLU毕业证)圣路易斯大学毕业证成绩单专业办理
一比一原版(SLU毕业证)圣路易斯大学毕业证成绩单专业办理一比一原版(SLU毕业证)圣路易斯大学毕业证成绩单专业办理
一比一原版(SLU毕业证)圣路易斯大学毕业证成绩单专业办理
keoku
 
Comptia N+ Standard Networking lesson guide
Comptia N+ Standard Networking lesson guideComptia N+ Standard Networking lesson guide
Comptia N+ Standard Networking lesson guide
GTProductions1
 
JAVIER LASA-EXPERIENCIA digital 1986-2024.pdf
JAVIER LASA-EXPERIENCIA digital 1986-2024.pdfJAVIER LASA-EXPERIENCIA digital 1986-2024.pdf
JAVIER LASA-EXPERIENCIA digital 1986-2024.pdf
Javier Lasa
 
Internet-Security-Safeguarding-Your-Digital-World (1).pptx
Internet-Security-Safeguarding-Your-Digital-World (1).pptxInternet-Security-Safeguarding-Your-Digital-World (1).pptx
Internet-Security-Safeguarding-Your-Digital-World (1).pptx
VivekSinghShekhawat2
 
1比1复刻(bath毕业证书)英国巴斯大学毕业证学位证原版一模一样
1比1复刻(bath毕业证书)英国巴斯大学毕业证学位证原版一模一样1比1复刻(bath毕业证书)英国巴斯大学毕业证学位证原版一模一样
1比1复刻(bath毕业证书)英国巴斯大学毕业证学位证原版一模一样
3ipehhoa
 
急速办(bedfordhire毕业证书)英国贝德福特大学毕业证成绩单原版一模一样
急速办(bedfordhire毕业证书)英国贝德福特大学毕业证成绩单原版一模一样急速办(bedfordhire毕业证书)英国贝德福特大学毕业证成绩单原版一模一样
急速办(bedfordhire毕业证书)英国贝德福特大学毕业证成绩单原版一模一样
3ipehhoa
 
一比一原版(CSU毕业证)加利福尼亚州立大学毕业证成绩单专业办理
一比一原版(CSU毕业证)加利福尼亚州立大学毕业证成绩单专业办理一比一原版(CSU毕业证)加利福尼亚州立大学毕业证成绩单专业办理
一比一原版(CSU毕业证)加利福尼亚州立大学毕业证成绩单专业办理
ufdana
 
Latest trends in computer networking.pptx
Latest trends in computer networking.pptxLatest trends in computer networking.pptx
Latest trends in computer networking.pptx
JungkooksNonexistent
 
History+of+E-commerce+Development+in+China-www.cfye-commerce.shop
History+of+E-commerce+Development+in+China-www.cfye-commerce.shopHistory+of+E-commerce+Development+in+China-www.cfye-commerce.shop
History+of+E-commerce+Development+in+China-www.cfye-commerce.shop
laozhuseo02
 
一比一原版(LBS毕业证)伦敦商学院毕业证成绩单专业办理
一比一原版(LBS毕业证)伦敦商学院毕业证成绩单专业办理一比一原版(LBS毕业证)伦敦商学院毕业证成绩单专业办理
一比一原版(LBS毕业证)伦敦商学院毕业证成绩单专业办理
eutxy
 

Recently uploaded (20)

Bridging the Digital Gap Brad Spiegel Macon, GA Initiative.pptx
Bridging the Digital Gap Brad Spiegel Macon, GA Initiative.pptxBridging the Digital Gap Brad Spiegel Macon, GA Initiative.pptx
Bridging the Digital Gap Brad Spiegel Macon, GA Initiative.pptx
 
test test test test testtest test testtest test testtest test testtest test ...
test test  test test testtest test testtest test testtest test testtest test ...test test  test test testtest test testtest test testtest test testtest test ...
test test test test testtest test testtest test testtest test testtest test ...
 
原版仿制(uob毕业证书)英国伯明翰大学毕业证本科学历证书原版一模一样
原版仿制(uob毕业证书)英国伯明翰大学毕业证本科学历证书原版一模一样原版仿制(uob毕业证书)英国伯明翰大学毕业证本科学历证书原版一模一样
原版仿制(uob毕业证书)英国伯明翰大学毕业证本科学历证书原版一模一样
 
1.Wireless Communication System_Wireless communication is a broad term that i...
1.Wireless Communication System_Wireless communication is a broad term that i...1.Wireless Communication System_Wireless communication is a broad term that i...
1.Wireless Communication System_Wireless communication is a broad term that i...
 
How to Use Contact Form 7 Like a Pro.pptx
How to Use Contact Form 7 Like a Pro.pptxHow to Use Contact Form 7 Like a Pro.pptx
How to Use Contact Form 7 Like a Pro.pptx
 
This 7-second Brain Wave Ritual Attracts Money To You.!
This 7-second Brain Wave Ritual Attracts Money To You.!This 7-second Brain Wave Ritual Attracts Money To You.!
This 7-second Brain Wave Ritual Attracts Money To You.!
 
The+Prospects+of+E-Commerce+in+China.pptx
The+Prospects+of+E-Commerce+in+China.pptxThe+Prospects+of+E-Commerce+in+China.pptx
The+Prospects+of+E-Commerce+in+China.pptx
 
APNIC Foundation, presented by Ellisha Heppner at the PNG DNS Forum 2024
APNIC Foundation, presented by Ellisha Heppner at the PNG DNS Forum 2024APNIC Foundation, presented by Ellisha Heppner at the PNG DNS Forum 2024
APNIC Foundation, presented by Ellisha Heppner at the PNG DNS Forum 2024
 
BASIC C++ lecture NOTE C++ lecture 3.pptx
BASIC C++ lecture NOTE C++ lecture 3.pptxBASIC C++ lecture NOTE C++ lecture 3.pptx
BASIC C++ lecture NOTE C++ lecture 3.pptx
 
guildmasters guide to ravnica Dungeons & Dragons 5...
guildmasters guide to ravnica Dungeons & Dragons 5...guildmasters guide to ravnica Dungeons & Dragons 5...
guildmasters guide to ravnica Dungeons & Dragons 5...
 
一比一原版(SLU毕业证)圣路易斯大学毕业证成绩单专业办理
一比一原版(SLU毕业证)圣路易斯大学毕业证成绩单专业办理一比一原版(SLU毕业证)圣路易斯大学毕业证成绩单专业办理
一比一原版(SLU毕业证)圣路易斯大学毕业证成绩单专业办理
 
Comptia N+ Standard Networking lesson guide
Comptia N+ Standard Networking lesson guideComptia N+ Standard Networking lesson guide
Comptia N+ Standard Networking lesson guide
 
JAVIER LASA-EXPERIENCIA digital 1986-2024.pdf
JAVIER LASA-EXPERIENCIA digital 1986-2024.pdfJAVIER LASA-EXPERIENCIA digital 1986-2024.pdf
JAVIER LASA-EXPERIENCIA digital 1986-2024.pdf
 
Internet-Security-Safeguarding-Your-Digital-World (1).pptx
Internet-Security-Safeguarding-Your-Digital-World (1).pptxInternet-Security-Safeguarding-Your-Digital-World (1).pptx
Internet-Security-Safeguarding-Your-Digital-World (1).pptx
 
1比1复刻(bath毕业证书)英国巴斯大学毕业证学位证原版一模一样
1比1复刻(bath毕业证书)英国巴斯大学毕业证学位证原版一模一样1比1复刻(bath毕业证书)英国巴斯大学毕业证学位证原版一模一样
1比1复刻(bath毕业证书)英国巴斯大学毕业证学位证原版一模一样
 
急速办(bedfordhire毕业证书)英国贝德福特大学毕业证成绩单原版一模一样
急速办(bedfordhire毕业证书)英国贝德福特大学毕业证成绩单原版一模一样急速办(bedfordhire毕业证书)英国贝德福特大学毕业证成绩单原版一模一样
急速办(bedfordhire毕业证书)英国贝德福特大学毕业证成绩单原版一模一样
 
一比一原版(CSU毕业证)加利福尼亚州立大学毕业证成绩单专业办理
一比一原版(CSU毕业证)加利福尼亚州立大学毕业证成绩单专业办理一比一原版(CSU毕业证)加利福尼亚州立大学毕业证成绩单专业办理
一比一原版(CSU毕业证)加利福尼亚州立大学毕业证成绩单专业办理
 
Latest trends in computer networking.pptx
Latest trends in computer networking.pptxLatest trends in computer networking.pptx
Latest trends in computer networking.pptx
 
History+of+E-commerce+Development+in+China-www.cfye-commerce.shop
History+of+E-commerce+Development+in+China-www.cfye-commerce.shopHistory+of+E-commerce+Development+in+China-www.cfye-commerce.shop
History+of+E-commerce+Development+in+China-www.cfye-commerce.shop
 
一比一原版(LBS毕业证)伦敦商学院毕业证成绩单专业办理
一比一原版(LBS毕业证)伦敦商学院毕业证成绩单专业办理一比一原版(LBS毕业证)伦敦商学院毕业证成绩单专业办理
一比一原版(LBS毕业证)伦敦商学院毕业证成绩单专业办理
 

Overlapping Ping Monitoring

  • 1. Overlapping Ring Monitoring Algorithm in TIPC Jon Maloy, Ericsson Canada Inc. Montreal April 7th 2017
  • 2. When a cluster node becomes unresponsive due to crash, reboot or lost connectivity we want to:  Have all affected connections on the remaining nodes aborted  Inform other users who have subscribed for cluster connectivity events  Within a well-defined short interval from the occurrence of the event PURPOSE
  • 3. 1) Crank up the connection keepalive timer  Network and CPU load quickly gets out of hand when there are thousands of connections  Does not provide a neighbor monitoring service that can be used by others 2) Dedicated full-mesh framework of per-node daemons with frequently probed connections  Even here monitoring traffic becomes overwhelming when cluster size > 100 nodes  Does not automatically abort any other connections COMMON SOLUTIONS
  • 4.  Full-mesh framework of frequently probed node-to-node “links”  At kernel level  Provides generic neighbor monitoring service  Each link endpoint keeps track of all connections to peer node  Issues “ABORT” message to its local socket endpoints when connectivity to peer node is lost  Even this solution causes excessive traffic beyond ~100 nodes  CPU load grows with ~N  Network load grows with ~N*(N-1) TIPC SOLUTION: HIERARCHY + FULL MESH
  • 5.  Each node monitors its two nearest neighbors by heatbeats  Low monitoring network overhead, - increases by ~2*N  Node loss can also be detected through loss of an iterating token  Both solutions offered by Corosync  Hard to handle accidental network partitioning  How do we detect loss of nodes not adjacent to fracture point in opposite partition?  Consensus on ring topology required OTHER SOLUTION: RING
  • 6.  Each node periodically transmits its known network view to a randomly selected set of known neighbors  Each node knows and monitors only a subset of all nodes  Scales extremely well  Used by BitTorrent client Tribler  Non-deterministic delay until all cluster nodes are informed  Potentially very long because of the periodic and random nature of event propagation  Unpredictable number of generations to reach last node  Extra network overhead because of duplicate information spreading OTHER SOLUTION: GOSSIP PROTOCOL
  • 7. THE CHALLENGE Finding an algorithm which:  Has the scalability of Gossip, but with  A deterministic set of peer nodes to monitor and update from each node  A predictable number of propagation generations before all nodes are reached  Predictable, well-defined and short event propagation delay  Has the light-weight properties of ring monitoring, but  Is able to handle accidental network partitioning  Has the full-mesh link connectivity of TIPC, but  Does not require full-mesh active monitoring
  • 8. THE ANSWER: OVERLAPPING RING MONITORING  Sort all cluster nodes into a circular list  All nodes use same algorithm and criteria  Select next [√N] - 1 downstream nodes in the list as “local domain” to be actively monitored  CPU load increases by ~√N  Distribute a record describing the local domain to all other nodes in the cluster  Select and monitor a set of “head” nodes outside the local domain so that no node is more than two active monitoring hops away  There will be [√N] - 1 such nodes  Guarantees failure discovery even at accidental network partitioning  Each node now monitors 2 x (√N – 1) neighbors • 6 neighbors in a 16 node cluster • 56 neighbors in an 800 node cluster  All nodes use this algorithm  In total 2 x (√N - 1) x N actively monitored links • 96 links in a 16 node cluster • 44,800 links in an 800 node cluster + x N = (√N – 1) Local Domain Destinations (√N – 1) Remote “Head” Destinations 2 x N x (√N – 1) Actively Monitored Links
  • 9. LOSS OF LOCAL DOMAIN NODE State change of local domain node detected 1  A domain record is sent to all other nodes in cluster when any state change (discovery, loss, re-establish) is detected in a local domain node  The record keeps a generation id, so the receiver can know if it really contains a change before it starts parsing and applying it  It is piggy-backed on regular unicast link state/probe messages, which must always be sent out after a domain state change  May be sent several times until the receiver acknowledges reception of the current generation  Because probing is driven by a background timer, it may take up to 375 ms (configurable) until all nodes are updated 1 Domain record distributed to all other nodes in cluster
  • 10. LOSS OF ACTIVELY MONITORED HEAD NODE Node failure detected Brief confirmation probing of lost node’s domain members After recalculation  The two-hop criteria plus confirmation probing eliminates the network partitioning problem  If we really have a partition worst-case failure detection time will be  Tfailmax = 2 x active failure detection time  Active failure detection time is configurable  50 ms – 10 s  Default 1.5 s in TIPC/Linux 4.7 Actively monitored nodes outside local domain
  • 11. LOSS OF INDIRECTLY MONITORED NODE Actively monitoring neighbors discover failure Actively monitoring neighbors report failure  Max one event propagation hop  Near uniform failure detection time across the whole cluster  Tfailmax = active failure detection time + (1 x event propagation hop time) Actively monitored nodes outside local domain
  • 12. DIFFERING NETWORK VIEWS 1 A node has discovered a peer that nobody else is monitoring  Actively monitor that node  Add it to its circular list according to algorithm (as local domain member or “head”)  Handle its domain members according to algorithm (“applied” or “non-applied”)  Continue calculating the monitoring view from the next peer Actively monitored nodes outside local domain 1 A node is unable to discover a peer that others are monitoring  Don’t add the peer to the circular list  Ignore it during the calculation of the monitoring view  Keep it as “non-applied” in the copies of received domain records  Apply it to the monitoring view if it is discovered at a later moment Transiently, this happens all the time, and must be considered a normal situation
  • 13. STATUS LISTING OF 16 NODE CLUSTER 5 13 9 1
  • 14. STATUS LISTING OF 600 NODE CLUSTER