Your SlideShare is downloading. ×
Improving Robustness In Distributed Systems
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Improving Robustness In Distributed Systems

1,990
views

Published on

Published in: Technology, News & Politics

0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,990
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
47
Comments
0
Likes
1
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Improving Robustness in Distributed Systems Per Bergqvist [email_address] Erlang User Conference 2001 (courtesy CellPoint Systems AB)
  • 2. Design base
    • Cluster of cooperating hosts
    • Erlang and C
    • COTS hardware based
    • Unix based (i.e. Solaris or Linux)
    • 10/100/1000 base-T back plane (”system area network”)
  • 3. Cluster
    • Shared, distributed, system configuration
    • Each host have ONE cluster controller
    • Dispatch and supervise worker tasks
    • Master cluster controller: holds configuration database (persistent replica)
    • Slave cluster controller: gets configuration from master cluster controllers
    • Cluster is DOWN when all master cluster controllers are inaccessible
  • 4. Typical system Firewall Switch Traffic Control
  • 5. Cluster Key Benefits
    • Single system view
    • Enforces decoupling of parts of O&M from actual traffic processing
  • 6. Implementing a cluster
    • Cluster->Host->Node->NodeData
    • Cluster global parameters
    • Subscription mechanisms for conf. changes
    • Mnesia as configuration database on master cluster controllers
    • Homebrewn configuration distribution to slave controllers (NOT using mnesia)
    • (Worker) node supervision
  • 7. Mnesia gotchas
    • First distributed node startup
      • Disallow writes when all replicas not accessible
      • Use timeout on table load and force load
  • 8. ... BUT ...
    • TCP based distribution
    • Network partitioning
  • 9. Network parameters
    • Align TCP retransmission intervals w/ Erlang heartbeats
    • Align TCP and IP rerouting parameters
  • 10. Typical system II: Dual back plane Firewall Switch Traffic Control
  • 11. Erlang multi-homing problem Host A Host B Host C
  • 12. Multi-home Erlang w/ TCP
    • Add an alias interface to loop back i/f
    • Patch tcp distribution to bind to alias
    • Publish alias interface on (all wanted) via real hw i/f’s
      • Method 1: Static routes and gratuitous/proxy arp
      • Method 2: Use new (routing) protocol
  • 13. ARP method
    • Implement a utility to: - broadcast unsolicited ARP responses - respond to ARP requests for the alias i/f address
    • Add static routes on all far end systems
    • NOTE: all real i/f needs to be on same IP subnet
  • 14. New routing protocol
    • Broadcast (Ethernet frames) what you have, including interface priority
    • Let the far end select path based on what/when they receive
    • Far end dynamically sets up host routes
    • Use short retransmission intervals
  • 15. Erlang multi-homing resolved ? Host A Host B Host C
  • 16. Summing up
    • Erlang can support multihoming with some additional work
    • By using loop back alias i/f, link failure becomes a routing problem (peer-peer association is kept intact)
    • Solaris TCP/IP stack parameters are: - hard to find (only in out-of-date app. notes) - hard to set ”right” - host global
    • A distribution mechanism with built-in support for multi-homing preferred
  • 17. Erlang Distribution over SCTP Per Bergqvist et al [email_address] Erlang User Conference 2002

×