SlideShare a Scribd company logo
1 of 28
Download to read offline
Gossip protocol and applications
Tu Nguyen
Staff Software Engineer - Axon
Gossip protocol
Gossip in computer science
A peer-to-peer communication protocol●
Inspired by epidemics, human gossip and social networks (spreading rumors)●
epidemic protocol (synonym)■
why ?■
rumors or epidemics in society travel at a great speed and reach to almost every member of the community
without needing a central coordinator.
●
Gossip was founded originally to solve Multicast problem●
Multicast●
we want to communicate a message to all the nodes in the network■
each node sends the message to only a few of the nodes■
Multicast problems ?●
Fault-tolerance: node might crash, packet might be dropped, etc○
Scalability: millions, hundreds of millions of nodes○
Centralized: single sender “multi-cast” TCP/UDP packets to others.○
Tree-based multicast: too much redundancy with ACK/NACK msg.○
Multicast was originally heavily used in network devices (eg. routers); how to leverage it in application layer ?○
Gossip basic
A node wants to share some information to the other nodes in the network. Then periodically it
selects randomly a node from the set of nodes and exchanges the information. The node that
receives the information does exactly the same thing.
Cycle●
number of rounds to spread the information■
Fanout●
number of nodes that a node “gossip” within each cycle■
Gossip properties
Node selection must be random (or guarantee enough peer diversity)●
Node only stores local information. There is no shared global state.●
Communication is round-based (periodic).●
Transmission and processing capacity per round is limited.●
All nodes run the same protocol.●
Not deterministic (because of randomness peer sampling).●
Advantages of Gossip
Scalable●
Fault-tolerance●
Robust●
Decentralized●
Convergent consistency●
Gossip modeling
Consider a distributed network where nodes are message-passing to each
other.
State of a node●
Susceptible - node has not received update yet (is not infected).■
Infected - node with an update it is willing to share.■
Removed - node has received the update but is not willing to share.■
Two basic models●
SI (anti-entropy)■
SIR (rumor-mongering)■
When R state happens ?
👉 Many algorithms. One of them are counting for redundant messages.
Gossip modeling
Push / Pull / Push-Pull●
Push■
I nodes are the ones sending/infecting S nodes●
efficient when there are a few updates.●
Pull■
all nodes are actively pulling for updates●
efficient when there are many updates.●
Push-Pull■
node pushes when it has updates and also pulls for new updates●
node and selected node are exchanging information ●
Gossip modeling
https://flopezluis.github.io/gossip-simulator/
Gossip Applications
Applications
Cluster membership●
Information dissemination●
Failure detection●
Database replication●
Overlay network●
Aggregations●
Cluster Membership
 Who are my live peers ?
Desired properties
Connectedness●
Balance●
Short path-length●
Reducing redundancy●
Scalability●
Accuracy●
Full Partial
Full Partial
👍 Connectedness
👍 Short-path length
👌 Accuracy
👌 Balance
👎 High redundancy
👎 Low scalability
👌 Connectedness
👌 Short-path length
👌 Accuracy
👌 Balance
👍 Low redundancy
👍 High scalability
Cluster Membership
✅
SWIM - Cornell University 2002●
SCAMP - Microsoft Research 2003●
CYCLON - Vrije University, The Netherlands, 2005●
HYPARVIEW - University of Lisbon, 2007●
Cluster Membership
SWIM - Cornell university (2002)
Scalable Weakly-consistent Infection-style Process Group
Membership
https://www.cs.cornell.edu/projects/Quicksilver/public_pdfs/SWIM.pdf
Properties
Scalable●
Weakly consistent●
Infection-style●
Membership protocol●
SWIM
Motivated by traditional heart-beating●
every interval T, notify peers of liveness■
if no update received from peer P after T * limit, mark P as dead.■
heart-beat = membership + failure detection■
Heart-beat is doing good at:●
completeness - yes!■
strong completeness - every crashed node is eventually detected by all correct
nodes.
●
Accuracy - high!■
Heart-beat problems ?●
Network load: N^2■
SWIM is trying to ...
Separate two problems and solve them one-by-one●
Failure detection (👉 “live” peers)○
Membership protocol (👉 list of peers)○
Optimization●
Reduce network load○
Failure detection○
decrease processing time●
increase accuracy●
Failure Detection properties
One step back...●
The two properties of a distributed system□
Safety - nothing bad ever happens○
Liveness - something good eventually happens.○
Failure Detection properties●
Completeness (L) - failure detector would find the node(s) that finally crashed in the
system. 
□
Accuracy (S) - correct decisions that the failure detector has made in a node.□
Failure Detection properties
Degree of completeness●
depends on number of crashed nodes is suspected by a failure detector in a certain
period
□
Strong completeness - every faulty node is eventually permanently suspected by every non-
faulty node
○
Weak completeness - every faulty node is eventually permanently suspected by some non-faulty
node
○
Degree of accuracy●
depends on number of mistakes that a failure detector made in certain period□
Strong accuracy - no node is suspected (by any node) before it crashes○
Weak accuracy - some non-faulty node is never suspected○
Eventual strong accuracy - after some time, system becomes strong accuracy.○
Eventual weak accuracy - after some time, system becomes weak accuracy.○
SWIM Failure Detection
Each node in set of N node●
Choose a random peer○
Ping - ACK□
Indirect Ping (iff no ACK)○
Choose k random peers□
indirect Ping○
Evaluation:
completeness: every nodes will be pinged!●
accuracy: “high” (🔍)●
speed of detection: 1 * Interval●
network load: (4*k + 2) * N ~ 0(N)●
SWIM Membership Protocol
Aware of join / leave nodes●
Motivated by Gossip●
Piggy-back approach■
Infection-style○
ping is sent to random peer□
eventually (weakly) consistent□
updates send peer-to-peer□
SWIM - Optimization
Suspicion state - to improve accuracy
Trade-off between failure detection time and false positives.●
Introduce suspicion state.●
A 👉 B: Ping! Suspect C failed■
B 👉 A: ACK!■
A few moment later■
A, B 👉 C: Ping! Are you dead ?□
C 👉 A,B: ACK! (i’m not 😋)□
State FSM
SWIM - Optimization
Round-robin probe peer selection
Randomly sort peer set■
Ping in round-robin order■
Evaluation:
Completeness: increase, time-bounded○
State FSM
SWIM - Limitations
Node leave vs fail●
Re-joining●
Event ordering●
Message encryption●
Peer metadata●
Custom payload●
Network participants●
More details:  https://www.cs.cornell.edu/projects/Quicksilver/public_pdfs/SWIM.pdf
SWIM - Implementation
memberlist https://github.com/hashicorp/memberlist●
serf, consul, etcd are relying on swim-based memberlist for failure detection and group
membership.
●
Other “announced” applications
Cassandra internal - understand gossip https://www.youtube.com/watch?v=FuP1Fvrv6ZQ●
AWS S3 gossip http://status.aws.amazon.com/s3-20080720.html●
Slicing structured overlay network
T-MAN  https://www.researchgate.net/publication/225403352_T-Man_Gossip-
Based_Overlay_Topology_Management
●
https://managementfromscratch.wordpress.com/2016/04/01/introduction-to-gossip●

More Related Content

What's hot

Scalability, Availability & Stability Patterns
Scalability, Availability & Stability PatternsScalability, Availability & Stability Patterns
Scalability, Availability & Stability PatternsJonas Bonér
 
Grokking TechTalk #31: Asynchronous Communications
Grokking TechTalk #31: Asynchronous CommunicationsGrokking TechTalk #31: Asynchronous Communications
Grokking TechTalk #31: Asynchronous CommunicationsGrokking VN
 
Monitoring Microservices
Monitoring MicroservicesMonitoring Microservices
Monitoring MicroservicesWeaveworks
 
Grokking Techtalk #43: Payment gateway demystified
Grokking Techtalk #43: Payment gateway demystifiedGrokking Techtalk #43: Payment gateway demystified
Grokking Techtalk #43: Payment gateway demystifiedGrokking VN
 
Introduction to kubernetes
Introduction to kubernetesIntroduction to kubernetes
Introduction to kubernetesRishabh Indoria
 
Grokking Techtalk: Problem solving for sw engineers
Grokking Techtalk: Problem solving for sw engineersGrokking Techtalk: Problem solving for sw engineers
Grokking Techtalk: Problem solving for sw engineers9diov
 
YOW2018 Cloud Performance Root Cause Analysis at Netflix
YOW2018 Cloud Performance Root Cause Analysis at NetflixYOW2018 Cloud Performance Root Cause Analysis at Netflix
YOW2018 Cloud Performance Root Cause Analysis at NetflixBrendan Gregg
 
NGINX Installation and Tuning
NGINX Installation and TuningNGINX Installation and Tuning
NGINX Installation and TuningNGINX, Inc.
 
Grokking TechTalk #35: Efficient spellchecking
Grokking TechTalk #35: Efficient spellcheckingGrokking TechTalk #35: Efficient spellchecking
Grokking TechTalk #35: Efficient spellcheckingGrokking VN
 
netfilter and iptables
netfilter and iptablesnetfilter and iptables
netfilter and iptablesKernel TLV
 
ARM IoT Firmware Emulation Workshop
ARM IoT Firmware Emulation WorkshopARM IoT Firmware Emulation Workshop
ARM IoT Firmware Emulation WorkshopSaumil Shah
 
Chips alliance omni xtend overview
Chips alliance omni xtend overviewChips alliance omni xtend overview
Chips alliance omni xtend overviewRISC-V International
 
Grokking Techtalk #37: Data intensive problem
 Grokking Techtalk #37: Data intensive problem Grokking Techtalk #37: Data intensive problem
Grokking Techtalk #37: Data intensive problemGrokking VN
 
Grokking Techtalk #37: Software design and refactoring
 Grokking Techtalk #37: Software design and refactoring Grokking Techtalk #37: Software design and refactoring
Grokking Techtalk #37: Software design and refactoringGrokking VN
 
Everything You wanted to Know About Distributed Tracing
Everything You wanted to Know About Distributed TracingEverything You wanted to Know About Distributed Tracing
Everything You wanted to Know About Distributed TracingAmuhinda Hungai
 
Introduction to Raft algorithm
Introduction to Raft algorithmIntroduction to Raft algorithm
Introduction to Raft algorithmmuayyad alsadi
 
itlchn 20 - Kien truc he thong chung khoan - Phan 2
itlchn 20 - Kien truc he thong chung khoan - Phan 2itlchn 20 - Kien truc he thong chung khoan - Phan 2
itlchn 20 - Kien truc he thong chung khoan - Phan 2IT Expert Club
 
The linux networking architecture
The linux networking architectureThe linux networking architecture
The linux networking architecturehugo lu
 

What's hot (20)

Scalability, Availability & Stability Patterns
Scalability, Availability & Stability PatternsScalability, Availability & Stability Patterns
Scalability, Availability & Stability Patterns
 
Grokking TechTalk #31: Asynchronous Communications
Grokking TechTalk #31: Asynchronous CommunicationsGrokking TechTalk #31: Asynchronous Communications
Grokking TechTalk #31: Asynchronous Communications
 
Monitoring Microservices
Monitoring MicroservicesMonitoring Microservices
Monitoring Microservices
 
Grokking Techtalk #43: Payment gateway demystified
Grokking Techtalk #43: Payment gateway demystifiedGrokking Techtalk #43: Payment gateway demystified
Grokking Techtalk #43: Payment gateway demystified
 
Introduction to kubernetes
Introduction to kubernetesIntroduction to kubernetes
Introduction to kubernetes
 
Grokking Techtalk: Problem solving for sw engineers
Grokking Techtalk: Problem solving for sw engineersGrokking Techtalk: Problem solving for sw engineers
Grokking Techtalk: Problem solving for sw engineers
 
YOW2018 Cloud Performance Root Cause Analysis at Netflix
YOW2018 Cloud Performance Root Cause Analysis at NetflixYOW2018 Cloud Performance Root Cause Analysis at Netflix
YOW2018 Cloud Performance Root Cause Analysis at Netflix
 
NGINX Installation and Tuning
NGINX Installation and TuningNGINX Installation and Tuning
NGINX Installation and Tuning
 
OpenStack Cinder
OpenStack CinderOpenStack Cinder
OpenStack Cinder
 
Grokking TechTalk #35: Efficient spellchecking
Grokking TechTalk #35: Efficient spellcheckingGrokking TechTalk #35: Efficient spellchecking
Grokking TechTalk #35: Efficient spellchecking
 
netfilter and iptables
netfilter and iptablesnetfilter and iptables
netfilter and iptables
 
ARM IoT Firmware Emulation Workshop
ARM IoT Firmware Emulation WorkshopARM IoT Firmware Emulation Workshop
ARM IoT Firmware Emulation Workshop
 
Chips alliance omni xtend overview
Chips alliance omni xtend overviewChips alliance omni xtend overview
Chips alliance omni xtend overview
 
Grokking Techtalk #37: Data intensive problem
 Grokking Techtalk #37: Data intensive problem Grokking Techtalk #37: Data intensive problem
Grokking Techtalk #37: Data intensive problem
 
Grokking Techtalk #37: Software design and refactoring
 Grokking Techtalk #37: Software design and refactoring Grokking Techtalk #37: Software design and refactoring
Grokking Techtalk #37: Software design and refactoring
 
Prometheus monitoring
Prometheus monitoringPrometheus monitoring
Prometheus monitoring
 
Everything You wanted to Know About Distributed Tracing
Everything You wanted to Know About Distributed TracingEverything You wanted to Know About Distributed Tracing
Everything You wanted to Know About Distributed Tracing
 
Introduction to Raft algorithm
Introduction to Raft algorithmIntroduction to Raft algorithm
Introduction to Raft algorithm
 
itlchn 20 - Kien truc he thong chung khoan - Phan 2
itlchn 20 - Kien truc he thong chung khoan - Phan 2itlchn 20 - Kien truc he thong chung khoan - Phan 2
itlchn 20 - Kien truc he thong chung khoan - Phan 2
 
The linux networking architecture
The linux networking architectureThe linux networking architecture
The linux networking architecture
 

Similar to Grokking Techtalk #39: Gossip protocol and applications

Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & PythonCassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & PythonDataStax Academy
 
Module: drand - the Distributed Randomness Beacon
Module: drand - the Distributed Randomness BeaconModule: drand - the Distributed Randomness Beacon
Module: drand - the Distributed Randomness BeaconIoannis Psaras
 
BSIT3CD_Continuation of Cyber incident response (1).pdf
BSIT3CD_Continuation of Cyber incident response (1).pdfBSIT3CD_Continuation of Cyber incident response (1).pdf
BSIT3CD_Continuation of Cyber incident response (1).pdfStevenJoeBiago
 
Ake hedman why we need to unite and why vscp is a solution to a problem
Ake hedman  why we need to unite and why vscp is a solution to a problemAke hedman  why we need to unite and why vscp is a solution to a problem
Ake hedman why we need to unite and why vscp is a solution to a problemWithTheBest
 
Iot with-the-best & VSCP
Iot with-the-best & VSCPIot with-the-best & VSCP
Iot with-the-best & VSCPAke Hedman
 
IoT Malware: Comprehensive Survey, Analysis Framework and Case Studies
IoT Malware: Comprehensive Survey, Analysis Framework and Case StudiesIoT Malware: Comprehensive Survey, Analysis Framework and Case Studies
IoT Malware: Comprehensive Survey, Analysis Framework and Case StudiesDefCamp
 
IoT Malware: Comprehensive Survey, Analysis Framework and Case Studies
IoT Malware: Comprehensive Survey, Analysis Framework and Case StudiesIoT Malware: Comprehensive Survey, Analysis Framework and Case Studies
IoT Malware: Comprehensive Survey, Analysis Framework and Case StudiesPriyanka Aash
 
Making Sites Reliable (как сделать систему надежной) (Павел Уваров, Андрей Та...
Making Sites Reliable (как сделать систему надежной) (Павел Уваров, Андрей Та...Making Sites Reliable (как сделать систему надежной) (Павел Уваров, Андрей Та...
Making Sites Reliable (как сделать систему надежной) (Павел Уваров, Андрей Та...Ontico
 
Overcoming Variable Payloads to Optimize for Performance
Overcoming Variable Payloads to Optimize for PerformanceOvercoming Variable Payloads to Optimize for Performance
Overcoming Variable Payloads to Optimize for PerformanceScyllaDB
 
Computer network (7)
Computer network (7)Computer network (7)
Computer network (7)NYversity
 
Apache cassandra an introduction
Apache cassandra  an introductionApache cassandra  an introduction
Apache cassandra an introductionShehaaz Saif
 
Ple18 web-security-david-busby
Ple18 web-security-david-busbyPle18 web-security-david-busby
Ple18 web-security-david-busbyDavid Busby, CISSP
 
Monitoring - deeper dive
Monitoring  - deeper diveMonitoring  - deeper dive
Monitoring - deeper diveRobert Kubiś
 
Storing the real world data
Storing the real world dataStoring the real world data
Storing the real world dataAthira Mukundan
 
introduction to advanced distributed system
introduction to advanced distributed systemintroduction to advanced distributed system
introduction to advanced distributed systemmilkesa13
 
Computer network (8)
Computer network (8)Computer network (8)
Computer network (8)NYversity
 
Cassandra - A Decentralized Structured Storage System
Cassandra - A Decentralized Structured Storage SystemCassandra - A Decentralized Structured Storage System
Cassandra - A Decentralized Structured Storage SystemVarad Meru
 
IT Security Basics For Managers
IT Security Basics For ManagersIT Security Basics For Managers
IT Security Basics For ManagersDaniel Owens
 
SNMP(Simple Network Management Protocol)
SNMP(Simple Network Management Protocol)SNMP(Simple Network Management Protocol)
SNMP(Simple Network Management Protocol)Mohammad Awais Javaid
 
honeypots.ppt
honeypots.ppthoneypots.ppt
honeypots.pptDetSersi
 

Similar to Grokking Techtalk #39: Gossip protocol and applications (20)

Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & PythonCassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
 
Module: drand - the Distributed Randomness Beacon
Module: drand - the Distributed Randomness BeaconModule: drand - the Distributed Randomness Beacon
Module: drand - the Distributed Randomness Beacon
 
BSIT3CD_Continuation of Cyber incident response (1).pdf
BSIT3CD_Continuation of Cyber incident response (1).pdfBSIT3CD_Continuation of Cyber incident response (1).pdf
BSIT3CD_Continuation of Cyber incident response (1).pdf
 
Ake hedman why we need to unite and why vscp is a solution to a problem
Ake hedman  why we need to unite and why vscp is a solution to a problemAke hedman  why we need to unite and why vscp is a solution to a problem
Ake hedman why we need to unite and why vscp is a solution to a problem
 
Iot with-the-best & VSCP
Iot with-the-best & VSCPIot with-the-best & VSCP
Iot with-the-best & VSCP
 
IoT Malware: Comprehensive Survey, Analysis Framework and Case Studies
IoT Malware: Comprehensive Survey, Analysis Framework and Case StudiesIoT Malware: Comprehensive Survey, Analysis Framework and Case Studies
IoT Malware: Comprehensive Survey, Analysis Framework and Case Studies
 
IoT Malware: Comprehensive Survey, Analysis Framework and Case Studies
IoT Malware: Comprehensive Survey, Analysis Framework and Case StudiesIoT Malware: Comprehensive Survey, Analysis Framework and Case Studies
IoT Malware: Comprehensive Survey, Analysis Framework and Case Studies
 
Making Sites Reliable (как сделать систему надежной) (Павел Уваров, Андрей Та...
Making Sites Reliable (как сделать систему надежной) (Павел Уваров, Андрей Та...Making Sites Reliable (как сделать систему надежной) (Павел Уваров, Андрей Та...
Making Sites Reliable (как сделать систему надежной) (Павел Уваров, Андрей Та...
 
Overcoming Variable Payloads to Optimize for Performance
Overcoming Variable Payloads to Optimize for PerformanceOvercoming Variable Payloads to Optimize for Performance
Overcoming Variable Payloads to Optimize for Performance
 
Computer network (7)
Computer network (7)Computer network (7)
Computer network (7)
 
Apache cassandra an introduction
Apache cassandra  an introductionApache cassandra  an introduction
Apache cassandra an introduction
 
Ple18 web-security-david-busby
Ple18 web-security-david-busbyPle18 web-security-david-busby
Ple18 web-security-david-busby
 
Monitoring - deeper dive
Monitoring  - deeper diveMonitoring  - deeper dive
Monitoring - deeper dive
 
Storing the real world data
Storing the real world dataStoring the real world data
Storing the real world data
 
introduction to advanced distributed system
introduction to advanced distributed systemintroduction to advanced distributed system
introduction to advanced distributed system
 
Computer network (8)
Computer network (8)Computer network (8)
Computer network (8)
 
Cassandra - A Decentralized Structured Storage System
Cassandra - A Decentralized Structured Storage SystemCassandra - A Decentralized Structured Storage System
Cassandra - A Decentralized Structured Storage System
 
IT Security Basics For Managers
IT Security Basics For ManagersIT Security Basics For Managers
IT Security Basics For Managers
 
SNMP(Simple Network Management Protocol)
SNMP(Simple Network Management Protocol)SNMP(Simple Network Management Protocol)
SNMP(Simple Network Management Protocol)
 
honeypots.ppt
honeypots.ppthoneypots.ppt
honeypots.ppt
 

More from Grokking VN

Grokking Techtalk #46: Lessons from years hacking and defending Vietnamese banks
Grokking Techtalk #46: Lessons from years hacking and defending Vietnamese banksGrokking Techtalk #46: Lessons from years hacking and defending Vietnamese banks
Grokking Techtalk #46: Lessons from years hacking and defending Vietnamese banksGrokking VN
 
Grokking Techtalk #45: First Principles Thinking
Grokking Techtalk #45: First Principles ThinkingGrokking Techtalk #45: First Principles Thinking
Grokking Techtalk #45: First Principles ThinkingGrokking VN
 
Grokking Techtalk #42: Engineering challenges on building data platform for M...
Grokking Techtalk #42: Engineering challenges on building data platform for M...Grokking Techtalk #42: Engineering challenges on building data platform for M...
Grokking Techtalk #42: Engineering challenges on building data platform for M...Grokking VN
 
Grokking Techtalk #40: AWS’s philosophy on designing MLOps platform
Grokking Techtalk #40: AWS’s philosophy on designing MLOps platformGrokking Techtalk #40: AWS’s philosophy on designing MLOps platform
Grokking Techtalk #40: AWS’s philosophy on designing MLOps platformGrokking VN
 
Grokking Techtalk #39: How to build an event driven architecture with Kafka ...
 Grokking Techtalk #39: How to build an event driven architecture with Kafka ... Grokking Techtalk #39: How to build an event driven architecture with Kafka ...
Grokking Techtalk #39: How to build an event driven architecture with Kafka ...Grokking VN
 
Grokking Techtalk #38: Escape Analysis in Go compiler
 Grokking Techtalk #38: Escape Analysis in Go compiler Grokking Techtalk #38: Escape Analysis in Go compiler
Grokking Techtalk #38: Escape Analysis in Go compilerGrokking VN
 
Grokking Techtalk #34: K8S On-premise: Incident & Lesson Learned ZaloPay Mer...
 Grokking Techtalk #34: K8S On-premise: Incident & Lesson Learned ZaloPay Mer... Grokking Techtalk #34: K8S On-premise: Incident & Lesson Learned ZaloPay Mer...
Grokking Techtalk #34: K8S On-premise: Incident & Lesson Learned ZaloPay Mer...Grokking VN
 
Grokking TechTalk #33: Architecture of AI-First Systems - Engineering for Big...
Grokking TechTalk #33: Architecture of AI-First Systems - Engineering for Big...Grokking TechTalk #33: Architecture of AI-First Systems - Engineering for Big...
Grokking TechTalk #33: Architecture of AI-First Systems - Engineering for Big...Grokking VN
 
Grokking TechTalk #30: From App to Ecosystem: Lessons Learned at Scale
Grokking TechTalk #30: From App to Ecosystem: Lessons Learned at ScaleGrokking TechTalk #30: From App to Ecosystem: Lessons Learned at Scale
Grokking TechTalk #30: From App to Ecosystem: Lessons Learned at ScaleGrokking VN
 
Grokking TechTalk #29: Building Realtime Metrics Platform at LinkedIn
Grokking TechTalk #29: Building Realtime Metrics Platform at LinkedInGrokking TechTalk #29: Building Realtime Metrics Platform at LinkedIn
Grokking TechTalk #29: Building Realtime Metrics Platform at LinkedInGrokking VN
 
Grokking TechTalk #27: Optimal Binary Search Tree
Grokking TechTalk #27: Optimal Binary Search TreeGrokking TechTalk #27: Optimal Binary Search Tree
Grokking TechTalk #27: Optimal Binary Search TreeGrokking VN
 
Grokking TechTalk #26: Kotlin, Understand the Magic
Grokking TechTalk #26: Kotlin, Understand the MagicGrokking TechTalk #26: Kotlin, Understand the Magic
Grokking TechTalk #26: Kotlin, Understand the MagicGrokking VN
 
Grokking TechTalk #26: Compare ios and android platform
Grokking TechTalk #26: Compare ios and android platformGrokking TechTalk #26: Compare ios and android platform
Grokking TechTalk #26: Compare ios and android platformGrokking VN
 
Grokking TechTalk #24: Thiết kế hệ thống Background Job Queue bằng Ruby & Pos...
Grokking TechTalk #24: Thiết kế hệ thống Background Job Queue bằng Ruby & Pos...Grokking TechTalk #24: Thiết kế hệ thống Background Job Queue bằng Ruby & Pos...
Grokking TechTalk #24: Thiết kế hệ thống Background Job Queue bằng Ruby & Pos...Grokking VN
 
Grokking TechTalk #24: Kafka's principles and protocols
Grokking TechTalk #24: Kafka's principles and protocolsGrokking TechTalk #24: Kafka's principles and protocols
Grokking TechTalk #24: Kafka's principles and protocolsGrokking VN
 
Grokking TechTalk #21: Deep Learning in Computer Vision
Grokking TechTalk #21: Deep Learning in Computer VisionGrokking TechTalk #21: Deep Learning in Computer Vision
Grokking TechTalk #21: Deep Learning in Computer VisionGrokking VN
 
Grokking TechTalk #20: PostgreSQL Internals 101
Grokking TechTalk #20: PostgreSQL Internals 101Grokking TechTalk #20: PostgreSQL Internals 101
Grokking TechTalk #20: PostgreSQL Internals 101Grokking VN
 
Grokking TechTalk #19: Software Development Cycle In The International Moneta...
Grokking TechTalk #19: Software Development Cycle In The International Moneta...Grokking TechTalk #19: Software Development Cycle In The International Moneta...
Grokking TechTalk #19: Software Development Cycle In The International Moneta...Grokking VN
 
Grokking TechTalk #18B: Giới thiệu về Viễn thông Di động
Grokking TechTalk #18B:  Giới thiệu về Viễn thông Di độngGrokking TechTalk #18B:  Giới thiệu về Viễn thông Di động
Grokking TechTalk #18B: Giới thiệu về Viễn thông Di độngGrokking VN
 
Grokking TechTalk #18B: VoIP Architecture For Telecommunications
Grokking TechTalk #18B: VoIP Architecture For TelecommunicationsGrokking TechTalk #18B: VoIP Architecture For Telecommunications
Grokking TechTalk #18B: VoIP Architecture For TelecommunicationsGrokking VN
 

More from Grokking VN (20)

Grokking Techtalk #46: Lessons from years hacking and defending Vietnamese banks
Grokking Techtalk #46: Lessons from years hacking and defending Vietnamese banksGrokking Techtalk #46: Lessons from years hacking and defending Vietnamese banks
Grokking Techtalk #46: Lessons from years hacking and defending Vietnamese banks
 
Grokking Techtalk #45: First Principles Thinking
Grokking Techtalk #45: First Principles ThinkingGrokking Techtalk #45: First Principles Thinking
Grokking Techtalk #45: First Principles Thinking
 
Grokking Techtalk #42: Engineering challenges on building data platform for M...
Grokking Techtalk #42: Engineering challenges on building data platform for M...Grokking Techtalk #42: Engineering challenges on building data platform for M...
Grokking Techtalk #42: Engineering challenges on building data platform for M...
 
Grokking Techtalk #40: AWS’s philosophy on designing MLOps platform
Grokking Techtalk #40: AWS’s philosophy on designing MLOps platformGrokking Techtalk #40: AWS’s philosophy on designing MLOps platform
Grokking Techtalk #40: AWS’s philosophy on designing MLOps platform
 
Grokking Techtalk #39: How to build an event driven architecture with Kafka ...
 Grokking Techtalk #39: How to build an event driven architecture with Kafka ... Grokking Techtalk #39: How to build an event driven architecture with Kafka ...
Grokking Techtalk #39: How to build an event driven architecture with Kafka ...
 
Grokking Techtalk #38: Escape Analysis in Go compiler
 Grokking Techtalk #38: Escape Analysis in Go compiler Grokking Techtalk #38: Escape Analysis in Go compiler
Grokking Techtalk #38: Escape Analysis in Go compiler
 
Grokking Techtalk #34: K8S On-premise: Incident & Lesson Learned ZaloPay Mer...
 Grokking Techtalk #34: K8S On-premise: Incident & Lesson Learned ZaloPay Mer... Grokking Techtalk #34: K8S On-premise: Incident & Lesson Learned ZaloPay Mer...
Grokking Techtalk #34: K8S On-premise: Incident & Lesson Learned ZaloPay Mer...
 
Grokking TechTalk #33: Architecture of AI-First Systems - Engineering for Big...
Grokking TechTalk #33: Architecture of AI-First Systems - Engineering for Big...Grokking TechTalk #33: Architecture of AI-First Systems - Engineering for Big...
Grokking TechTalk #33: Architecture of AI-First Systems - Engineering for Big...
 
Grokking TechTalk #30: From App to Ecosystem: Lessons Learned at Scale
Grokking TechTalk #30: From App to Ecosystem: Lessons Learned at ScaleGrokking TechTalk #30: From App to Ecosystem: Lessons Learned at Scale
Grokking TechTalk #30: From App to Ecosystem: Lessons Learned at Scale
 
Grokking TechTalk #29: Building Realtime Metrics Platform at LinkedIn
Grokking TechTalk #29: Building Realtime Metrics Platform at LinkedInGrokking TechTalk #29: Building Realtime Metrics Platform at LinkedIn
Grokking TechTalk #29: Building Realtime Metrics Platform at LinkedIn
 
Grokking TechTalk #27: Optimal Binary Search Tree
Grokking TechTalk #27: Optimal Binary Search TreeGrokking TechTalk #27: Optimal Binary Search Tree
Grokking TechTalk #27: Optimal Binary Search Tree
 
Grokking TechTalk #26: Kotlin, Understand the Magic
Grokking TechTalk #26: Kotlin, Understand the MagicGrokking TechTalk #26: Kotlin, Understand the Magic
Grokking TechTalk #26: Kotlin, Understand the Magic
 
Grokking TechTalk #26: Compare ios and android platform
Grokking TechTalk #26: Compare ios and android platformGrokking TechTalk #26: Compare ios and android platform
Grokking TechTalk #26: Compare ios and android platform
 
Grokking TechTalk #24: Thiết kế hệ thống Background Job Queue bằng Ruby & Pos...
Grokking TechTalk #24: Thiết kế hệ thống Background Job Queue bằng Ruby & Pos...Grokking TechTalk #24: Thiết kế hệ thống Background Job Queue bằng Ruby & Pos...
Grokking TechTalk #24: Thiết kế hệ thống Background Job Queue bằng Ruby & Pos...
 
Grokking TechTalk #24: Kafka's principles and protocols
Grokking TechTalk #24: Kafka's principles and protocolsGrokking TechTalk #24: Kafka's principles and protocols
Grokking TechTalk #24: Kafka's principles and protocols
 
Grokking TechTalk #21: Deep Learning in Computer Vision
Grokking TechTalk #21: Deep Learning in Computer VisionGrokking TechTalk #21: Deep Learning in Computer Vision
Grokking TechTalk #21: Deep Learning in Computer Vision
 
Grokking TechTalk #20: PostgreSQL Internals 101
Grokking TechTalk #20: PostgreSQL Internals 101Grokking TechTalk #20: PostgreSQL Internals 101
Grokking TechTalk #20: PostgreSQL Internals 101
 
Grokking TechTalk #19: Software Development Cycle In The International Moneta...
Grokking TechTalk #19: Software Development Cycle In The International Moneta...Grokking TechTalk #19: Software Development Cycle In The International Moneta...
Grokking TechTalk #19: Software Development Cycle In The International Moneta...
 
Grokking TechTalk #18B: Giới thiệu về Viễn thông Di động
Grokking TechTalk #18B:  Giới thiệu về Viễn thông Di độngGrokking TechTalk #18B:  Giới thiệu về Viễn thông Di động
Grokking TechTalk #18B: Giới thiệu về Viễn thông Di động
 
Grokking TechTalk #18B: VoIP Architecture For Telecommunications
Grokking TechTalk #18B: VoIP Architecture For TelecommunicationsGrokking TechTalk #18B: VoIP Architecture For Telecommunications
Grokking TechTalk #18B: VoIP Architecture For Telecommunications
 

Recently uploaded

Earthing details of Electrical Substation
Earthing details of Electrical SubstationEarthing details of Electrical Substation
Earthing details of Electrical Substationstephanwindworld
 
Past, Present and Future of Generative AI
Past, Present and Future of Generative AIPast, Present and Future of Generative AI
Past, Present and Future of Generative AIabhishek36461
 
Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...VICTOR MAESTRE RAMIREZ
 
US Department of Education FAFSA Week of Action
US Department of Education FAFSA Week of ActionUS Department of Education FAFSA Week of Action
US Department of Education FAFSA Week of ActionMebane Rash
 
Unit7-DC_Motors nkkjnsdkfnfcdfknfdgfggfg
Unit7-DC_Motors nkkjnsdkfnfcdfknfdgfggfgUnit7-DC_Motors nkkjnsdkfnfcdfknfdgfggfg
Unit7-DC_Motors nkkjnsdkfnfcdfknfdgfggfgsaravananr517913
 
Work Experience-Dalton Park.pptxfvvvvvvv
Work Experience-Dalton Park.pptxfvvvvvvvWork Experience-Dalton Park.pptxfvvvvvvv
Work Experience-Dalton Park.pptxfvvvvvvvLewisJB
 
Indian Dairy Industry Present Status and.ppt
Indian Dairy Industry Present Status and.pptIndian Dairy Industry Present Status and.ppt
Indian Dairy Industry Present Status and.pptMadan Karki
 
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catchers
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor CatchersTechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catchers
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catcherssdickerson1
 
Crushers to screens in aggregate production
Crushers to screens in aggregate productionCrushers to screens in aggregate production
Crushers to screens in aggregate productionChinnuNinan
 
Industrial Safety Unit-IV workplace health and safety.ppt
Industrial Safety Unit-IV workplace health and safety.pptIndustrial Safety Unit-IV workplace health and safety.ppt
Industrial Safety Unit-IV workplace health and safety.pptNarmatha D
 
National Level Hackathon Participation Certificate.pdf
National Level Hackathon Participation Certificate.pdfNational Level Hackathon Participation Certificate.pdf
National Level Hackathon Participation Certificate.pdfRajuKanojiya4
 
Correctly Loading Incremental Data at Scale
Correctly Loading Incremental Data at ScaleCorrectly Loading Incremental Data at Scale
Correctly Loading Incremental Data at ScaleAlluxio, Inc.
 
chpater16.pptxMMMMMMMMMMMMMMMMMMMMMMMMMMM
chpater16.pptxMMMMMMMMMMMMMMMMMMMMMMMMMMMchpater16.pptxMMMMMMMMMMMMMMMMMMMMMMMMMMM
chpater16.pptxMMMMMMMMMMMMMMMMMMMMMMMMMMMNanaAgyeman13
 
Autonomous emergency braking system (aeb) ppt.ppt
Autonomous emergency braking system (aeb) ppt.pptAutonomous emergency braking system (aeb) ppt.ppt
Autonomous emergency braking system (aeb) ppt.pptbibisarnayak0
 
DM Pillar Training Manual.ppt will be useful in deploying TPM in project
DM Pillar Training Manual.ppt will be useful in deploying TPM in projectDM Pillar Training Manual.ppt will be useful in deploying TPM in project
DM Pillar Training Manual.ppt will be useful in deploying TPM in projectssuserb6619e
 
Python Programming for basic beginners.pptx
Python Programming for basic beginners.pptxPython Programming for basic beginners.pptx
Python Programming for basic beginners.pptxmohitesoham12
 
Class 1 | NFPA 72 | Overview Fire Alarm System
Class 1 | NFPA 72 | Overview Fire Alarm SystemClass 1 | NFPA 72 | Overview Fire Alarm System
Class 1 | NFPA 72 | Overview Fire Alarm Systemirfanmechengr
 
Mine Environment II Lab_MI10448MI__________.pptx
Mine Environment II Lab_MI10448MI__________.pptxMine Environment II Lab_MI10448MI__________.pptx
Mine Environment II Lab_MI10448MI__________.pptxRomil Mishra
 
Main Memory Management in Operating System
Main Memory Management in Operating SystemMain Memory Management in Operating System
Main Memory Management in Operating SystemRashmi Bhat
 

Recently uploaded (20)

Earthing details of Electrical Substation
Earthing details of Electrical SubstationEarthing details of Electrical Substation
Earthing details of Electrical Substation
 
Past, Present and Future of Generative AI
Past, Present and Future of Generative AIPast, Present and Future of Generative AI
Past, Present and Future of Generative AI
 
Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...
 
US Department of Education FAFSA Week of Action
US Department of Education FAFSA Week of ActionUS Department of Education FAFSA Week of Action
US Department of Education FAFSA Week of Action
 
Unit7-DC_Motors nkkjnsdkfnfcdfknfdgfggfg
Unit7-DC_Motors nkkjnsdkfnfcdfknfdgfggfgUnit7-DC_Motors nkkjnsdkfnfcdfknfdgfggfg
Unit7-DC_Motors nkkjnsdkfnfcdfknfdgfggfg
 
Work Experience-Dalton Park.pptxfvvvvvvv
Work Experience-Dalton Park.pptxfvvvvvvvWork Experience-Dalton Park.pptxfvvvvvvv
Work Experience-Dalton Park.pptxfvvvvvvv
 
Indian Dairy Industry Present Status and.ppt
Indian Dairy Industry Present Status and.pptIndian Dairy Industry Present Status and.ppt
Indian Dairy Industry Present Status and.ppt
 
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catchers
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor CatchersTechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catchers
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catchers
 
Crushers to screens in aggregate production
Crushers to screens in aggregate productionCrushers to screens in aggregate production
Crushers to screens in aggregate production
 
Industrial Safety Unit-IV workplace health and safety.ppt
Industrial Safety Unit-IV workplace health and safety.pptIndustrial Safety Unit-IV workplace health and safety.ppt
Industrial Safety Unit-IV workplace health and safety.ppt
 
National Level Hackathon Participation Certificate.pdf
National Level Hackathon Participation Certificate.pdfNational Level Hackathon Participation Certificate.pdf
National Level Hackathon Participation Certificate.pdf
 
Correctly Loading Incremental Data at Scale
Correctly Loading Incremental Data at ScaleCorrectly Loading Incremental Data at Scale
Correctly Loading Incremental Data at Scale
 
chpater16.pptxMMMMMMMMMMMMMMMMMMMMMMMMMMM
chpater16.pptxMMMMMMMMMMMMMMMMMMMMMMMMMMMchpater16.pptxMMMMMMMMMMMMMMMMMMMMMMMMMMM
chpater16.pptxMMMMMMMMMMMMMMMMMMMMMMMMMMM
 
Autonomous emergency braking system (aeb) ppt.ppt
Autonomous emergency braking system (aeb) ppt.pptAutonomous emergency braking system (aeb) ppt.ppt
Autonomous emergency braking system (aeb) ppt.ppt
 
DM Pillar Training Manual.ppt will be useful in deploying TPM in project
DM Pillar Training Manual.ppt will be useful in deploying TPM in projectDM Pillar Training Manual.ppt will be useful in deploying TPM in project
DM Pillar Training Manual.ppt will be useful in deploying TPM in project
 
Python Programming for basic beginners.pptx
Python Programming for basic beginners.pptxPython Programming for basic beginners.pptx
Python Programming for basic beginners.pptx
 
Class 1 | NFPA 72 | Overview Fire Alarm System
Class 1 | NFPA 72 | Overview Fire Alarm SystemClass 1 | NFPA 72 | Overview Fire Alarm System
Class 1 | NFPA 72 | Overview Fire Alarm System
 
Mine Environment II Lab_MI10448MI__________.pptx
Mine Environment II Lab_MI10448MI__________.pptxMine Environment II Lab_MI10448MI__________.pptx
Mine Environment II Lab_MI10448MI__________.pptx
 
young call girls in Green Park🔝 9953056974 🔝 escort Service
young call girls in Green Park🔝 9953056974 🔝 escort Serviceyoung call girls in Green Park🔝 9953056974 🔝 escort Service
young call girls in Green Park🔝 9953056974 🔝 escort Service
 
Main Memory Management in Operating System
Main Memory Management in Operating SystemMain Memory Management in Operating System
Main Memory Management in Operating System
 

Grokking Techtalk #39: Gossip protocol and applications

  • 1. Gossip protocol and applications Tu Nguyen Staff Software Engineer - Axon
  • 3.
  • 4. Gossip in computer science A peer-to-peer communication protocol● Inspired by epidemics, human gossip and social networks (spreading rumors)● epidemic protocol (synonym)■ why ?■ rumors or epidemics in society travel at a great speed and reach to almost every member of the community without needing a central coordinator. ● Gossip was founded originally to solve Multicast problem● Multicast● we want to communicate a message to all the nodes in the network■ each node sends the message to only a few of the nodes■ Multicast problems ?● Fault-tolerance: node might crash, packet might be dropped, etc○ Scalability: millions, hundreds of millions of nodes○ Centralized: single sender “multi-cast” TCP/UDP packets to others.○ Tree-based multicast: too much redundancy with ACK/NACK msg.○ Multicast was originally heavily used in network devices (eg. routers); how to leverage it in application layer ?○
  • 5. Gossip basic A node wants to share some information to the other nodes in the network. Then periodically it selects randomly a node from the set of nodes and exchanges the information. The node that receives the information does exactly the same thing. Cycle● number of rounds to spread the information■ Fanout● number of nodes that a node “gossip” within each cycle■
  • 6. Gossip properties Node selection must be random (or guarantee enough peer diversity)● Node only stores local information. There is no shared global state.● Communication is round-based (periodic).● Transmission and processing capacity per round is limited.● All nodes run the same protocol.● Not deterministic (because of randomness peer sampling).●
  • 8. Gossip modeling Consider a distributed network where nodes are message-passing to each other. State of a node● Susceptible - node has not received update yet (is not infected).■ Infected - node with an update it is willing to share.■ Removed - node has received the update but is not willing to share.■ Two basic models● SI (anti-entropy)■ SIR (rumor-mongering)■ When R state happens ? 👉 Many algorithms. One of them are counting for redundant messages.
  • 9. Gossip modeling Push / Pull / Push-Pull● Push■ I nodes are the ones sending/infecting S nodes● efficient when there are a few updates.● Pull■ all nodes are actively pulling for updates● efficient when there are many updates.● Push-Pull■ node pushes when it has updates and also pulls for new updates● node and selected node are exchanging information ●
  • 13. Applications Cluster membership● Information dissemination● Failure detection● Database replication● Overlay network● Aggregations●
  • 14. Cluster Membership  Who are my live peers ? Desired properties Connectedness● Balance● Short path-length● Reducing redundancy● Scalability● Accuracy● Full Partial
  • 15. Full Partial 👍 Connectedness 👍 Short-path length 👌 Accuracy 👌 Balance 👎 High redundancy 👎 Low scalability 👌 Connectedness 👌 Short-path length 👌 Accuracy 👌 Balance 👍 Low redundancy 👍 High scalability Cluster Membership ✅
  • 16. SWIM - Cornell University 2002● SCAMP - Microsoft Research 2003● CYCLON - Vrije University, The Netherlands, 2005● HYPARVIEW - University of Lisbon, 2007● Cluster Membership
  • 17. SWIM - Cornell university (2002) Scalable Weakly-consistent Infection-style Process Group Membership https://www.cs.cornell.edu/projects/Quicksilver/public_pdfs/SWIM.pdf Properties Scalable● Weakly consistent● Infection-style● Membership protocol●
  • 18. SWIM Motivated by traditional heart-beating● every interval T, notify peers of liveness■ if no update received from peer P after T * limit, mark P as dead.■ heart-beat = membership + failure detection■ Heart-beat is doing good at:● completeness - yes!■ strong completeness - every crashed node is eventually detected by all correct nodes. ● Accuracy - high!■ Heart-beat problems ?● Network load: N^2■
  • 19. SWIM is trying to ... Separate two problems and solve them one-by-one● Failure detection (👉 “live” peers)○ Membership protocol (👉 list of peers)○ Optimization● Reduce network load○ Failure detection○ decrease processing time● increase accuracy●
  • 20. Failure Detection properties One step back...● The two properties of a distributed system□ Safety - nothing bad ever happens○ Liveness - something good eventually happens.○ Failure Detection properties● Completeness (L) - failure detector would find the node(s) that finally crashed in the system.  □ Accuracy (S) - correct decisions that the failure detector has made in a node.□
  • 21. Failure Detection properties Degree of completeness● depends on number of crashed nodes is suspected by a failure detector in a certain period □ Strong completeness - every faulty node is eventually permanently suspected by every non- faulty node ○ Weak completeness - every faulty node is eventually permanently suspected by some non-faulty node ○ Degree of accuracy● depends on number of mistakes that a failure detector made in certain period□ Strong accuracy - no node is suspected (by any node) before it crashes○ Weak accuracy - some non-faulty node is never suspected○ Eventual strong accuracy - after some time, system becomes strong accuracy.○ Eventual weak accuracy - after some time, system becomes weak accuracy.○
  • 22. SWIM Failure Detection Each node in set of N node● Choose a random peer○ Ping - ACK□ Indirect Ping (iff no ACK)○ Choose k random peers□ indirect Ping○ Evaluation: completeness: every nodes will be pinged!● accuracy: “high” (🔍)● speed of detection: 1 * Interval● network load: (4*k + 2) * N ~ 0(N)●
  • 23. SWIM Membership Protocol Aware of join / leave nodes● Motivated by Gossip● Piggy-back approach■ Infection-style○ ping is sent to random peer□ eventually (weakly) consistent□ updates send peer-to-peer□
  • 24. SWIM - Optimization Suspicion state - to improve accuracy Trade-off between failure detection time and false positives.● Introduce suspicion state.● A 👉 B: Ping! Suspect C failed■ B 👉 A: ACK!■ A few moment later■ A, B 👉 C: Ping! Are you dead ?□ C 👉 A,B: ACK! (i’m not 😋)□ State FSM
  • 25. SWIM - Optimization Round-robin probe peer selection Randomly sort peer set■ Ping in round-robin order■ Evaluation: Completeness: increase, time-bounded○ State FSM
  • 26. SWIM - Limitations Node leave vs fail● Re-joining● Event ordering● Message encryption● Peer metadata● Custom payload● Network participants● More details:  https://www.cs.cornell.edu/projects/Quicksilver/public_pdfs/SWIM.pdf
  • 27. SWIM - Implementation memberlist https://github.com/hashicorp/memberlist● serf, consul, etcd are relying on swim-based memberlist for failure detection and group membership. ●
  • 28. Other “announced” applications Cassandra internal - understand gossip https://www.youtube.com/watch?v=FuP1Fvrv6ZQ● AWS S3 gossip http://status.aws.amazon.com/s3-20080720.html● Slicing structured overlay network T-MAN  https://www.researchgate.net/publication/225403352_T-Man_Gossip- Based_Overlay_Topology_Management ● https://managementfromscratch.wordpress.com/2016/04/01/introduction-to-gossip●