Cs704 d distributedmutualexcclusion&memory


Published on

Distributed mutual exclusion, lamprort algorithm, Ricart Agarwala algorithm, Distributed memory system

Published in: Technology
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Cs704 d distributedmutualexcclusion&memory

  1. 1. Debasis Das
  2. 2. Mutual Exclusion CS 704D Advanced OS 2
  3. 3. Complexities In distributed systems  Absence of shared memory  Inter-node communication delays can be considerable  Global system state cannot be observed by constituent machines due to communication delays, component failures, absence of shared memory  Many more modes of failures yet fail soft is a goal CS 704D Advanced OS 3
  4. 4. Some Considerations  Policies/strategies developed for a distributed system can be made applicable in a uniprocessor case  However, policies/strategies developed for uniprocessor case cannot be extended to distributed case  Same can be simulated by adding a central resources allocator  Increase traffic to central allocator, the system will fail when the allocator fails  Election of a successor would be needed CS 704D Advanced OS 4
  5. 5. Required Assumptions  Messages exchanged by a pair of communicating processes need to be received in the same order as they were generated (pipelining property)  Every message is received without errors, no duplicates  The underlying network ensures all nodes are fully connected. Any node can communicate with every other node CS 704D Advanced OS 5
  6. 6. Desirable Properties of Algorithms  All nodes should have equal amount of information  Each node makes decisions on the basis of local information. The algorithm should ensure that nodes make consistent & coherent decisions  All nodes reach decisions through about equal effort  Failure of a node should not cause complete break down. The ability of reaching a decision and accessing the resources should not be affected CS 704D Advanced OS 6
  7. 7. Time & Ordering of Events Happened Before Relationship  Logical clock needs to ensure  If a and b are events in the same process and a comes before b then a->b  If event a is a representation of sending of a message and b is that of receiving of message in another process the a->b  It is a transitive relationship; that is if a->b and b->c then a->c  If a and b has no happened before relationship the a and b are said to be concurrent CS 704D Advanced OS 7
  8. 8. Time & Ordering of Events Logical Clock Properties  If a->b the C(a) < C(b)  Clock condition is satisfied if  If a and b are events in a process Pi and if a comes before b the Ci(a)< Ci(b)  If a is an event sending message m by process Pi and b is the receipt of message by process Pj then Ci(a) <Cj(b) CS 704D Advanced OS 8
  9. 9. Time & Ordering of Events Logical Clock Implementation  Process Pi increments the clock Ci between successive events  Message m needs to be time stamped so that T(m)=ci(a)  Receiving process adjusts clock such that it is max of (Cj+1, Tm) CS 704D Advanced OS 9
  10. 10. Total Ordering  ab only when  Ci(a) “less than” Cj(b) or  Ci(a) = Cj(b) and Pi “less than” Pj  Simple way to implement “less than” relation would be to assign a unique number to each process and define the “less than” such that i < j. CS 704D Advanced OS 10
  11. 11. Lamport’s Algorithm  Initiator i: Process Pi requires an exclusive access to a resource. Sends time stamped message request (Ti, i) where Ti = Ci to all the other processes.  Other processes(j, j not= i): When Pj receives the request, places the request on its own queue, send a reply with time stamp (Tj, j) to Process Pi  Pi is allowed access only when  Pi request is in front of the queue and  All replies are time stamped later that the Pi time stamp  Pi sends a release message by sending a release message, time stamped suitably  Pj removes Pi request from it request queue Cost: 3 (N-1) messages, works best on bus based system where broadcast costs are minimal CS 704D Advanced OS 11
  12. 12. Ricart-Agarwala Algorithm  Initiator i: Process Pi requires an exclusive access to a resource. Sends time stamped message request (Ti, i) where Ti = Ci to all the other processes.  Other processes(j, j not= i): When Pj receives the request reacts as follows,  If Pj is not requesting the resource, it sends a time stamped reply  If Pj needs the resource and the time stamp precedes the Pi’s time stamp Pi’s request is retained, else a time stamped reply is returned.  Pi is allowed access only when  Pi request is in front of the queue and  All replies are time stamped later that the Pi time stamp  Pi sends a releases resource by sending a release message, for each pending resources Cost: 2(N-1) messages CS 704D Advanced OS 12
  13. 13. Distributed Shared Memory  A software abstraction over the loosely coupled systems  Provides a shared memory kind of operation over the underlying IPC/RPC mechanisms  Can be implemented in OS kernel or runtime system  Also known as Distributed Shared Virtual Memory System (DSVM)  The shared space exists only virtually CS 704D Advanced OS 13
  14. 14. DSM Architecture CS 704D Advanced OS 14 Distributed Shared Memory Layer Memory Mapping CPU(s) Memory Mapping CPU(s) Memory Mapping CPU(s) Communication Network
  15. 15. DSM Architecture  Unlike tightly coupled systems, this shared memory is entirely virtual  Partitioned into blocks  Local memory is treated as large local caches  If the data requested is not available locally a network fault is generated  OS, through a message, requests the node holding the block and gets it migrated to the node where fault occurred  Data may be replicated locally  Configuration varies depending on what kind of replication, migration policies are used CS 704D Advanced OS 15
  16. 16. Design issues  Granularity (block size): Smaller size, higher faults, traffic; larger blocks mean jobs with higher locality  Structure: Layout of data, depends on application  Coherence & access synchronization: Like the cache situation in a uniprocessor system  Data Location & access: what data to be replicated, located  Replacement strategy  Thrashing  Heterogeneity CS 704D Advanced OS 16
  17. 17. Granularity CS 704D Advanced OS 17
  18. 18. Block Size Selection Factors  Large block sizes favored as overheads to transfer smaller blocks and larger one not too different  Paging overhead- paging overheads also favors larger block sizes, application should thus have larger locality of reference  Directory size-smaller block larger directory, larger management overhead  Thrashing- thrashing is likely to increase with larger block size  False sharing-larger block sizes increases probability. Consequence, higher thrashing CS 704D Advanced OS 18
  19. 19. Page Size as Block Size  Page size is preferred as the DSM block size  Advantages are  Existing page fault hardware can be used as block fault mechanism. Memory coherence can be handled in page fault handlers  Access control can be managed with existing memory mapping systems  If page size is less than packet size, no extra overhead  Page size proved to be, over time, the right unit as far as memory contention CS 704D Advanced OS 19
  20. 20. Structure CS 704D Advanced OS 20
  21. 21. Structure of Shared Memory Space  Approaches to structuring  No structure: a linear array of memory, easy to design  By data type: granularity per variable, complex to handle  As database: as tuple space, associative memory, primitives need to be added to languages, non transparent access to shared data CS 704D Advanced OS 21
  22. 22. Consistency Models CS 704D Advanced OS 22
  23. 23. Consistency Models  Strict consistency  Sequential consistency  Causal consistency  Pipelined random access memory consistency  Processor consistency  Weak consistency  Release consistency CS 704D Advanced OS 23
  24. 24. Strict Consistency Model  Value read of a memory address is the same as the latest write at that address  Writes become visible to all nodes  Needs absolute ordering of memory read/write operations, a global time required (to define most recent)  Nearly impossible to implement CS 704D Advanced OS 24
  25. 25. Sequential Consistency Model  All processes should see the same ordering of read, writes  Exact interleaving does not matter  No memory operation is started unless earlier operations have completed  Acceptable in most applications CS 704D Advanced OS 25
  26. 26. Causal Consistency Model  Operations are seen in same order (correct order)when they are causally related  W2 follows w1 and causally related, then w1, w2 is the order every process should see  They may not be seen in same order when not related causally CS 704D Advanced OS 26
  27. 27. Pipelined RAM Consistency Model  All writes of a single process are seen in the same order by other processes (as in a pipeline)  However, writes by other processes may appear in different order.  (W11,w12) and (w21, w22) can be seen as (wi1,wi2) followed by (w21, w22) or (w21, w22) followed by (w11,w12)  Simple to implement CS 704D Advanced OS 27
  28. 28. Processor Consistency Model  Adds memory coherence to the PRAM model  That is if the writes are for a particular memory location then all processes should see the writes in the same order that maintains memory coherence CS 704D Advanced OS 28
  29. 29. Weak Consistency Model  Changes in memory can be made after a set of changes has happened (example critical section)  Isolated access to variable is usually rare, usually there will be several accesses and then none at all  Difficulty is the system would not know when to show the changes  Application programmers can take care of this through a synchronization variable  Necessarily  All accesses to sync variable must follow strongest consistency9sequential)  All pending writes must be completed before access to sync variable is allowed  All previous access to sync must be completed before another access is allowed CS 704D Advanced OS 29
  30. 30. Release Consistency Model  Weak consistency model requires that  All changes made by a process are propagated to all nodes  All changes at other nodes are propagated to the processor node  Acquire and release variable used for sync so that only one of the operations above need to be done CS 704D Advanced OS 30
  31. 31. Discussion of Models  Strict sequential model s difficult to implement, almost never implemented  Sequential consistency model is most commonly used  Causal, PRAM, processor, weak and release consistency are the ones implemented in many DSM systems, programmers need to intervene  Weak and release consistency provides explicit sync variables to help with the consistency CS 704D Advanced OS 31
  32. 32. Implementing Sequential Concurrency Model  Implementing sequential consistency would depend on what replication/ migration are allowed  Migration/Replication strategies  Non replicated, non migrating blocks (NRNMBs)  Non replicated, migrating blocks (NRMBs)  Replicated, migrating blocks (RMBs)  Replicated, non migrating blocks (RNMBs) CS 704D Advanced OS 32
  33. 33. NRNMB  All requests to a block are routed through the OS and MMU to this one block that is not replicate and does not move anywhere  Can cause  Bottleneck because of serializing of memory accesses  Parallelism is not possible CS 704D Advanced OS 33
  34. 34. NRMB  No copies, if required entire block may be moved to the node that requires it  Advantages  No communication costs, all accesses are local  Applications can take advantage of locality, applications with high locality will perform better  Disadvantages  Prone to thrashing  No advantage of parallelism CS 704D Advanced OS 34
  35. 35. Data Locating in NRMB  Broadcast  Fault happens, a request is broadcast, current owner sends the block  Broadcast cause communication overheads  Centralized server  Request sent to the server, servers asks the node holding the block to send it to the requesting node, updates location information  Fixed distributed server  Fault handler finds mapping of block to the specific server, send request and gets the block  Dynamic distributed server  Fault causes a local search for probable owner, goes to that node, finds another probable owner or the block, gets block updates info CS 704D Advanced OS 35
  36. 36. RMB  Replication is required to increase parallelism  Reads can be done locally, writes has overheads  High read/write ratio systems can apportion the write overhead over many reads  Maintaining coherence throughout replicated block is an issue  Two basic protocols used are  Write-invalidate  Write update CS 704D Advanced OS 36
  37. 37. Coherence Protocols  Write-invalidate  On write fault, the fault handler copies the block from one of the nodes to its own  Invalidates all the copies, writes data  If another node needs it now, the updated block is replicated  Write update  On write fault, copy block to local node, update data  Send address & new data to all the replicas  Operation resumes after all the writes are done CS 704D Advanced OS 37
  38. 38. Comparison  Write update typically needs a global sequencer to makes sure all nodes see writes in the same sequence  Also the operations are full writes  Together there is a significant communication overhead  Write invalidate does not need all that, just a invalidation signal  Write invalidate is thus more often used method CS 704D Advanced OS 38
  39. 39. Data Locating in RMB Strategy  Owner of a block needs to be located, the most recent node which had write access  Node that has a valid copy will need to be tracked  Use on of the following  Broadcasting  Centralized server algorithm  Fixed distributed server algorithm  Dynamic distributed server algorithm CS 704D Advanced OS 39
  40. 40. RNMB  Replicas are maintained but blocks do not migrate  Consistency is maintained by updating all the replicas by a write update like process CS 704D Advanced OS 40
  41. 41. Data Locating in RNMB Strategy  Replica locations do not change  Replicas are kept consistent  Read requests can go to the nodes that has the data block  Writes through global sequencer CS 704D Advanced OS 41
  42. 42. Munin: A Release Consistent DSM System  Structure: a collection of shared variables  Each shared variable goes to a separate memory page  acquireLock and releaselock are used  Different consistency protocol is applied for different types of shared variable used in the system  Read-only, migratory, write-shared, producer-consumer, result, reduction and conventional CS 704D Advanced OS 42
  43. 43. Replacement Strategy CS 704D Advanced OS 43
  44. 44. Replacement Strategy  Shared memory blocks are replicated and/or migrated so two strategies need to be decided  Block to be replaced  Where should the replaced block go CS 704D Advanced OS 44
  45. 45. Blocks to Replace  Usage based vs. non-usage based  Fixed space vs. variable space  Unused  Nil  Read only  Read-owned  Writable CS 704D Advanced OS 45
  46. 46. Place for Replacement Block  Using secondary store locally  Using memory space of other nodes- store at free memory space in some other node. Free memory space status need to be exchanged, piggybacking on normal communication messages CS 704D Advanced OS 46
  47. 47. Thrashing CS 704D Advanced OS 47
  48. 48. Thrashing Situations  DSM allows migration, so migration back and forth leads to thrashing  Bata blocks keep migrating between nodes due to interleaved accesses by processes  Read only blocks are repeatedly invalidated so after replication CS 704D Advanced OS 48
  49. 49. Thrashing Reduction Strategies  Application controlled locks  Locking an application to a node for a time, deciding t could be a very difficult issue  Tune coherence strategy to the usage pattern, transparency of the memory system is compromised CS 704D Advanced OS 49
  50. 50. Other Approaches to DSM CS 704D Advanced OS 50
  51. 51. Approaches  Data caching managed by the OS  Data Caching managed by MMUs  Data Caching managed by the language run time system CS 704D Advanced OS 51
  52. 52. Heterogeneous DSM CS 704D Advanced OS 52
  53. 53. Features of Heterogeneous DSM  Data Conversion  Structuring DSM as a source of source language objects  Allowing one type of data in a block only (has complications)  Memory fragmentation  Compilation issues  Entire page is converted but a small part may be used before transfer  Not transparent, user provided conversion may be required CS 704D Advanced OS 53
  54. 54. Advantages of DSM CS 704D Advanced OS 54
  55. 55. Advantages  Simpler abstraction  Better portability of distributed applications  Better performance of some Systems  Flexible communications environment  Ease of process migration CS 704D Advanced OS 55