Shepherd: Node Monitors for Fault-Tolerant Distributed Process Execution in OSIRIS

2,791 views

Published on

Slides related to the WOWST 2010

Published in: Technology, Business
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
2,791
On SlideShare
0
From Embeds
0
Number of Embeds
204
Actions
Shares
0
Downloads
5
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide
  • present on-going work aimed at improv-
    ing OSIRIS' fault tolerance capabilities
  • processes can be imagined as programs
    that coordinate the invocation of distributed web services
    Late binding of service in-
    stances, in conjunction with load balancing strategies
    Offer alreadz self * properties
    Transactional garantees. the system is completely
    resilient to temporary node failures.
    Also, thanks to late binding, permanent failures of nodes participating to the execution of a process instance, but not involved in a computation at the moment of failure, do not affect the execution.
  • the node migrates the control for process execution to one or more successor nodes by delivering an
    activation token containing flow-control information and the whiteboard
  • the node migrates the control for process execution to one or more successor nodes by delivering an
    activation token containing flow-control information and the whiteboard
  • the node migrates the control for process execution to one or more successor nodes by delivering an
    activation token containing flow-control information and the whiteboard
  • the node migrates the control for process execution to one or more successor nodes by delivering an
    activation token containing flow-control information and the whiteboard
  • the node migrates the control for process execution to one or more successor nodes by delivering an
    activation token containing flow-control information and the whiteboard
  • the node migrates the control for process execution to one or more successor nodes by delivering an
    activation token containing flow-control information and the whiteboard
  • the node migrates the control for process execution to one or more successor nodes by delivering an
    activation token containing flow-control information and the whiteboard
  • the node migrates the control for process execution to one or more successor nodes by delivering an
    activation token containing flow-control information and the whiteboard
    Replacement node found (late binding)
  • the node migrates the control for process execution to one or more successor nodes by delivering an
    activation token containing flow-control information and the whiteboard
  • the node migrates the control for process execution to one or more successor nodes by delivering an
    activation token containing flow-control information and the whiteboard
  • the node migrates the control for process execution to one or more successor nodes by delivering an
    activation token containing flow-control information and the whiteboard
  • the node migrates the control for process execution to one or more successor nodes by delivering an
    activation token containing flow-control information and the whiteboard
  • Hardware, network or service failures
    If the node becomes temporarily disconnected from the network, the system is still able to recover.
    node will keep retrying to pass on the results until it succeeds
    Works very well in controlled environnments
  • present on-going work aimed at improv-
    ing OSIRIS' fault tolerance capabilities
  • WN assigned to Shepherds (herds)
    Shepherds organized in pools
    Leader
    Shepherds in the pool share state
    Persistence of process state
    Triggering of process activity
  • Leader of a pool communicates to a WN an activation key Ki
    Using Ki, WN gets the porcess state form SML
    WN writes the next process activity with a new key Ki+1 to SML
    WN sends the new activation key Ki+1 to the assigned pool of Shepherds
    Leader of the pool forwards the activation key to another pool of shepherds
    Another step that deltes entires from the shm
  • Unique activation key provides indenpendance of process activities
    Temporarily failed WNs that have been replaced are terminated
    The side-effects created by B that are not
    stored on the shared memory cannot be undone
  • DHT-like structured overlay
    Paxos commit protocol
    consistent information about the state of the activity it is supervising. Distributed transaction
    DHT fault detection mechanism to elect an
    appropriate shepherd replacement replica
  • Beernet DHT implementation
    with respect to the migration algorithm, only a passive role
  • Routing mechanism
    Failure-detection mechanism
    their state relative to the execution of the migration algorithm
    We use it to are assigned nodes to the herd of a shepherd
    several shepherds coordinate to form a pool
    how leader election within a pool proceeds
    Communication between a worker node and a pool of shepherds
    Shepherds are phisical nodes and Wns the reource to be stored
    Worker node ids in the circle lying inbetween 2 shepherd ids become the herd of the adjacent shepherd
  • take into consideration
    other factors to improve porcess execution
    as explained above is sufficient to guarantee the correctness of the routing and enable
    Late-binding.
    Aggregate load
  • present on-going work aimed at improv-
    ing OSIRIS' fault tolerance capabilities
  • Shepherd: Node Monitors for Fault-Tolerant Distributed Process Execution in OSIRIS

    1. 1. Shepherd: Node Monitors for Fault-Tolerant Distributed Process Execution in OSIRIS Nenad Stojnić Databases & Information Systems Group
    2. 2. Outline  Self-organizing properties in OSIRIS and current limitations  The Shepherd approach to fault-tolerance  Novel migration algorithm  Shepherd ring: herds, shepherd pools, routing  Binding ring: Service lookup, late binding, load balancing  Summary
    3. 3. OSIRIS Open Service Infrastructure for Reliable and Integrated process Support  Decentralized P2P execution of processes  Web Service Invocation  Fault-tolerant, Self-* properties  Late-binding & Load-balancing  Safe continuation-passing (2PC)  Pub/Sub Meta-data repositories
    4. 4. OSIRIS-Process execution example B CA Process Definition 1
    5. 5. OSIRIS-Process migration B CA Process Definition 1 E Service instances OSIRIS Layer D 5Whiteboard A
    6. 6. OSIRIS-Activity execution B CA Process Definition 1 AE Service instances OSIRIS Layer D 5Whiteboard
    7. 7. OSIRIS-Late binding B CA Process Definition 1 B A 2 AE Service instances OSIRIS Layer D 5 D B 4
    8. 8. OSIRIS-Late binding B CA Process Definition 1 B A 2 AE Service instances OSIRIS Layer D 5 Whiteboard D B 4
    9. 9. OSIRIS-Successor failure B CA Process Definition DC 3 1 B A 2 C E 6 AE Service instances OSIRIS Layer D 5 Whiteboard D B 4
    10. 10. OSIRIS-Successor failure B CA Process Definition DC 3 1 B A 2 C E 6 AE Service instances OSIRIS Layer D 5 Whiteboard D B 4 X
    11. 11. OSIRIS-Successor failure B CA Process Definition DC 3 1 B A 2 C E 6 AE Service instances OSIRIS Layer D 5 D B 4 X Whiteboard
    12. 12. OSIRIS-Migration failure B CA Process Definition DC 3 1 B A 2 C E 6 AE Service instances OSIRIS Layer D 5 D B 4 2PC Whiteboard X
    13. 13. OSIRIS-Predecessor failure B CA Process Definition DC 3 1 B A 2 C E 6 AE Service instances OSIRIS Layer D 5 D B 4 X Whiteboard
    14. 14. OSIRIS DC 3 1 B A 2 C E 6 D B 4 AE Service instances OSIRIS Layer D B CA Process Definition 5 Whiteboard
    15. 15. OSIRIS-Current node failure DC 3 B A 2 C E 6 Whiteboard D B 4 AE Service instances OSIRIS Layer D B CA Process Definition X1 5
    16. 16. OSIRIS failure handling Failure case Handling Successor failure Late-binding Migration failure 2PC abort Predecessor failure No handling necessary Temporary node failure Recovery from local stable storage Current node failure Process execution stops/hangs State is lost No notification
    17. 17. Outline  Self-organizing properties in OSIRIS and current limitations  The Shepherd approach to fault-tolerance  Novel migration algorithm  Shepherd ring: herds, shepherd pools, routing  Binding ring: Service lookup, late binding, load balancing  Summary
    18. 18. Our solution: Shepherd Shared Memory Layer BA OSIRIS Layer Shepherd Layer DC 3 E D 1 B A 2 D B 4 D A 5 2 EC 6 DC 7 Monitor Read/Write
    19. 19. Shepherd Migration Algorithm A S1 K0 11  Shepherd starts the activity  Picks a worker from the herd  Sends an activation key K0
    20. 20. Shepherd Migration Algorithm A S1 K0  Worker acknowledges supervision  Resends the activation key K0  Start of monitoring 22
    21. 21. Shepherd Migration Algorithm A S1 <K0 ,W0 >  Worker reads the whiteboard with the activation key K0 33
    22. 22. Shepherd Migration Algorithm (K1 ,B) A S1  Worker finishes execution  Generates a new activation key K1  Determines the service type to continue the execution 44
    23. 23. Shepherd Migration Algorithm A S1 <K1 ,W1 >  Worker writes the whiteboard with the activation key K1 55
    24. 24. Shepherd Migration Algorithm Wa c k A S1 66  Worker acknowledges write of whiteboard  Supervision ends
    25. 25. Shepherd Migration Algorithm A S1 S2 K1 ,B 77  Shepherd migrates to another shepherd  Passes on the activation key K1 and following service type
    26. 26. Shepherd Migration Algorithm (K1 ,B) Wa c k A S1 B S2 C S3 <K0 ,W0 > <K1 ,W1 > <K2 ,W2 > K0 K1 ,B K1 K2 ,C K2 K2 <K3 ,W3 > 11 33 55 44 66 77 (K2 ,C) Wa c k (K3 ,...) Wa c k 22K0 K1 K1 K2
    27. 27. Shepherd failure cases  Failure of worker nodes  Failure of shepherds  Failures in the shared memory
    28. 28. Failure of worker nodes  Replacement node from the herd  Same service type  Fail-safe services  BUT undo side effects on Shared Memory Wa c k S1 A'' S2 <K0 ,W0 > <K1 ,W1 > K0 33 55 A' K0 ... AX
    29. 29. Failure of shepherds  Shepherds organized in pools, state shared  WN speaks to the pool  Transactional writes → consistency guaranteed  New leader learns current state from the pool A S1X S2 Wa c k ...
    30. 30. Failures in shared memory  Chord-based  Replicated transactional storage  Successful writes persistent  failed read/write can be always retried A S1 <K0 ,W0 > <K1 ,W1 > X
    31. 31. Shepherd ring  Used for:  Worker node to shepherd assignment  Routing of messages from WN to shepherds  Pools construction  Based on Chord structured overlay  Indentifier circle of Shepherd node IDs and Worker node Ids (Consistent hashing)  Efficient routing: Log(NSh )
    32. 32. Shepherd ring S2 S5 S3 S4 ID1 ID0 ID2 ID3 ID4 ID5 ID6 ID7 ID8 ID9 ID10 ID11ID12 ID23 ID21 ID22 ID19 ID20 ID17 ID18 ID15 ID16 ID13 ID14 S1 WN deliver(96.76.89.12,join ())  Worker requests an assignment to a shepherd  Submits a join message to any known shepherd  If a shepherd leaves the ring the subsequent one takes over the herd
    33. 33. Shepherd ring S2 S5 S3 S4 ID1 ID0 ID2 ID3 ID4 ID5 ID6 ID7 ID8 ID9 ID10 ID11ID12 ID23 ID21 ID22 ID19 ID20 ID17 ID18 ID15 ID16 ID13 ID14 S1 h(96.76.89.12) = ID17 IP16 =98.x.x.x deliver(96.76.89.12,join())  Shepherd hashes worker Id  Routs the join message another shepherd  Routing until the responsible is found
    34. 34. Shepherd ring S2 S5 S3 S4 ID1 ID0 ID2 ID3 ID4 ID5 ID6 ID7 ID8 ID9 ID10 ID11ID12 ID23 ID21 ID22 ID19 ID20 ID17 ID18 ID15 ID16 ID13 ID14 S1 IP16 =98.x.x.x deliver(96.76.89.12,join()) IP17 =96.76.89.12  Worker joins the herd  Exchanges heartbeats with its shepherd
    35. 35. Shepherd pools  Symmetric replication strategy:  Node ID congruence-modulo equivalence classes  Responsible for x “knows” entire class of x  Pool = all responsibles for a class  Transactional guarantees  Paxos consensus
    36. 36. Shepherd pools S2 S5 S3 S4 ID1 ID0 ID2 ID3 ID4 ID5 ID6 ID7 ID8 ID9 ID10 ID11ID12 ID23 ID21 ID22 ID19 ID20 ID17 ID18 ID15 ID16 ID13 ID14 S1 Equivalence class: ID1 , ID9 , ID17 Congruence modulo: 8 Pool: S2 , S3 , S5 Pool size : 3
    37. 37. Shepherd pools S2 S5 S3 S4 ID1 ID0 ID2 ID3 ID4 ID5 ID6 ID7 ID8 ID9 ID10 ID11ID12 ID23 ID21 ID22 ID19 ID20 ID17 ID18 ID15 ID16 ID13 ID14 S1 Equivalence class: ID1 , ID9 , ID17 Congruence modulo: 8 Pool: S2 , S3 , S5 Equivalence class: ID2 , ID10 , ID18 Pool: S2 , S3 , S5 Pool size : 3
    38. 38. Shepherd pools S2 S5 S3 S4 ID1 ID0 ID2 ID3 ID4 ID5 ID6 ID7 ID8 ID9 ID10 ID11ID12 ID23 ID21 ID22 ID19 ID20 ID17 ID18 ID15 ID16 ID13 ID14 S1 Equivalence class: ID1 , ID9 , ID17 Congruence modulo: 8 Pool: S2 , S3 , S5 Equivalence class: ID2 , ID10 , ID18 Pool: S2 , S3 , S5 Equivalence class: ID3 , ID11 , ID19 Pool: S2 , S3 , S1 Pool size : 3
    39. 39. Late binding  Locate a shepherd providing service type T  Shepherd provides type T if it monitors instances of type T  Binding ring  Physical nodes & service types (resources)  Distributed “multimap” data structure  Service type → List of shepherds
    40. 40. Binding ring O3 O7 O5 O6 S8 O4 T1 T0 T2 T3 T4 T5 T6 T7 T8 T9 T10 T11T12 T23 T21 T22 T19 T20 T17 T18 T15 T16 T13 T14 store(T,S5 ) O1 O8 O2 rnd[1, Nfrag]? → 2 Tfrag 3  Storing shepherd S5 providing service type T  Query for number of fragments of type T
    41. 41. Binding ring O3 O7 O5 O6 S8 O4 T1 T0 T2 T3 T4 T5 T6 T7 T8 T9 T10 T11T12 T23 T21 T22 T19 T20 T17 T18 T15 T16 T13 T14 store(T,S5 ) O1 O8 O2 Tfrag3 S2 S3 Tfrag1 S1 Tfrag2 S4 Cfrag=3  Fragments of service type T in the ring  Each fragment is a multimap
    42. 42. Binding ring O3 O7 O5 O6 S8 O4 T1 T0 T2 T3 T4 T6 T7 T8 T9 T10 T11T12 T23 T21 T22 T19 T20 T17 T18 T15 T16 T13 T14 store(T,S5 ) O1 O8 O2 Tfrag3 S2 S3 Tfrag1 S1 Tfrag2 S4 S5 rnd[1, Nfrag]? → 2 storefrag(Tfrag2 ,S5 ) T5  Random selection of fragment for storage  If storage is full, create a new fragment and add to it
    43. 43. Load balancing  Optimize performance  Extended binding ring  Shepherd average load  Publish/subscribe of load information
    44. 44. Load balancing S3 S7 S5 ID1 ID0 ID2 ID3 ID4 ID5 ID6 ID7 ID8 ID9 ID10 ID11ID12 ID23 ID21 ID22 ID19 ID20 ID17 ID18 ID15 ID16 ID13 ID14 S1 WN3 Load = 40% WN5 Load = 60% 11 11 22 22  Shepherd ring  Worker nodes publish load to their shepherd
    45. 45. Load balancing O3 O7 O5 O6 S8 O4 T1 T0 T2 T3 T4 T5 T6 T7 T8 T9 T10 T11T12 T23 T21 T22 T19 T20 T17 T18 T15 T16 T13 T14 O1 O8 O2 S3 75% S2 60% Cfrag2 S1 70% Cfrag1 S1 70% S4 55% Afrag1 Cfrag2Cbest  Binding ring  Avg. load of a shepherd for a service type  Avg. load lists sorted in fragments
    46. 46. Load balancing O3 O7 O5 O6 S8 O4 ID9 T21 T22 ID15 O1 O8 O2 Cfrag1Cbest Cbest = < Cfrag1 , 50% > T1 T0T23 T2 T3 T4 T5 T6 T7 T8 T10 T11T12 T13 T14 T19 T20 T17 T18 T16 S4 55% S1 50% Afrag1 S3 75% S2 60% Cfrag2 S1 50% Cfrag1  Start contest  Least loaded type fragment becomes the best fragment
    47. 47. Outline  Self-organizing properties in OSIRIS and current limitations  The Shepherd approach to fault-tolerance  Novel migration algorithm  Shepherd ring: herds, shepherd pools, routing  Binding ring: Service lookup, late binding, load balancing  Summary
    48. 48. Summary  Shepherd:  Improved self-* properties in OSIRIS  Novel completely decentralized architecture  Future Work:  Implementation & Experimental evaluation  Extend to Stream-enabled services  Customize transactional protocols for efficiency  Economical cost-model (trade-off performance vs. robustness)
    49. 49. Thank you for your attention! Questions ?

    ×