Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Notes about concurrent and distributed systems & x86 virtualization

844 views

Published on

Some notes about the most important topics about concurrent and distributed systems, and some virtualization techniques for x86 architecture.

Published in: Software
  • How to use "The Scrambler" ot get a girl obsessed with BANGING you... ▲▲▲ http://ishbv.com/unlockher/pdf
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Want to earn $4000/m? Of course you do. Learn how when you join today! ➤➤ http://scamcb.com/ezpayjobs/pdf
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Be the first to like this

Notes about concurrent and distributed systems & x86 virtualization

  1. 1. Concurrent and Distributed Systems (Bechini)  MUTUAL EXCLUSION    Volatile​: any write to a volatile variable establishes a happens­before relationship with subsequent reads of                              that same variable. This means that changes to a volatile variable are always visible to other threads. What's                                    more, it also means that when a thread reads a volatile variable, it sees not just the latest change to the                                          volatile, but also the side effects of the code that led up the change.  Reads and writes are ​atomic​ for ​all​ variables​ declared volatile (​including​ long and double variables).    Mutual Exclusion desired properties​:  1. ME guaranteed in any case  2. A process out of the Critical Section MUST NOT prevent any other to access  3. No deadlock  4. No busy waiting  5. No starvation    Deadlock​: a situation in which two or more competing actions are each waiting for the other to finish, and thus                                        neither ever does.  A deadlockers situation can arise if all of the following conditions hold simultaneously in a system:​[1]  1. Mutual Exclusion​: At least one resource must be held in a non­shareable mode.​[1] Only one process                                can use the resource at any given instant of time.  2. Hold and Wait or ​Resource Holding: A process is currently holding at least one resource and                                requesting additional resources which are being held by other processes.  3. No​ ​Preemption​:​ a resource can be released only voluntarily by the process holding it.  4. Circular Wait​: A process must be waiting for a resource which is being held by another process, which                                    in turn is waiting for the first process to release the resource. In general, there is a ​set of waiting                                        processes, P = {P​1​, P​2​, ..., P​N​}, such that P​1 is waiting for a resource held by P​2​, P​2 is waiting for a                                              resource held by P​3​ and so on until P​N​ is waiting for a resource held by P​1​.​[1]​[7]  These four conditions are known as the ​Coffman conditions from their first description in a 1971 article by                                    Edward G. Coffman, Jr.​[7] Unfulfillment of any of these conditions is enough to preclude a deadlock from                                  occurring.    Busy waiting​: In ​software engineering​, ​busy­waiting or ​spinning is a technique in which a ​process                              repeatedly checks to see if a condition is true, such as whether ​keyboard input or a ​lock is available. In                                        low­level programming, busy­waits may actually be desirable. It may not be desirable or practical to implement                                interrupt­driven processing for every hardware device, particularly those that are seldom accessed.    Sleeping lock​: as opposite to a spinning lock, this technique puts a thread waiting for accessing a resource in                                      sleeping/ready mode. So, a thread is paused and its execution stops. The CPU can perform a context switch                                    and can keep working on another process/thread. This saves some CPU time (cycles) that would be wasted by                                   
  2. 2. a spinning lock implementation. Anyway, also sleeping locks have a time overhead that should be taken into                                  account when evaluating which solution to adopt.    Starvation​: a ​process is perpetually denied necessary ​resources to proceeds its work.​[1] Starvation may be                              caused by errors in a scheduling or ​mutual exclusion algorithm, but can also be caused by ​resource leaks​, and                                      can be intentionally caused via a​ ​denial­of­service attack​ such as a​ ​fork bomb​.    Race Condition​: (or ​race hazard​) is the behavior of an electronic, software or other ​system where the output                                    is dependent on the sequence or timing of other uncontrollable events. It becomes a ​bug when events do not                                      happen in the order the programmer intended. The term originates with the idea of two ​signals racing each                                    other to influence the output first. Race conditions arise in software when an application depends on the                                  sequence or timing of​ ​processes​ or​ ​threads​ for it to operate properly.    Dekker’s alg.​: the first known correct solution to the​ ​mutual exclusion​ problem in​ ​concurrent programming​.    Dekker's algorithm guarantees​ ​mutual exclusion​, freedom from​ ​deadlock​, and freedom from​ ​starvation​.  One advantage of this algorithm is that it doesn't require special ​Test­and­set (atomic read/modify/write)                            instructions and is therefore highly portable between languages and machine architectures. One disadvantage                          is that it is ​limited to two processes ​and makes use of ​busy waiting instead of process suspension. (The use of                                          busy waiting suggests that processes should spend a minimum of time inside the critical section.)  This algorithm won't work on​ ​SMP​ machines equipped with these CPUs without the use of​ ​memory barriers​.   
  3. 3. Peterson’s alg.​: a ​concurrent programming ​algorithm for ​mutual exclusion that allows two processes to share                              a single­use resource without conflict, using only shared memory for communication.    The algorithm uses two variables, ​flag and ​turn​. A ​flag[n] value of ​true indicates that the process ​n wants to                                        enter the ​critical section​. Entrance to the critical section is granted for process P0 if P1 does not want to enter                                          its critical section or if P1 has given priority to P0 by setting ​turn​ to 0.    The algorithm does satisfy the three essential criteria to solve the critical section problem, provided that                                changes to the variables turn, flag[0], and flag[1] propagate immediately and atomically. The while condition                              works even with preemption.​[1]    Filter alg.: generalization to N>2 of Peterson’s alg      HW support to ME​: an operation (or set of operations) is ​atomic​, ​linearizable​, ​indivisible or ​uninterruptible                                if it appears to the rest of the system to occur instantaneously. Atomicity is a guarantee of ​isolation from                                      concurrent processes​. Additionally, atomic operations commonly have a ​succeed­or­fail definition — they                        either successfully change the state of the system, or have no apparent effect.  ● Test_and_Set​: ​reads a variable’s value, copies it to store the old value, ​modifies it with the new value                                    and ​writes the old value returning it. ​Maurice Herlihy (1991) proved that test­and­set has a finite                                consensus number​, in contrast to the ​compare­and­swap operation. The test­and­set operation can                        solve the wait­free ​consensus problem for no more than two concurrent processes.​[1] However, more                           
  4. 4. than two decades before Herlihy's proof, ​IBM had already replaced Test­and­set by                        Compare­and­swap, which is a more general solution to this problem. It may suffer of ​starvation​. (from                                notes: in a multiprocessor system it is not enough alone to guarantee ME, it is such that interrupts must                                      be disabled).      ● Compare_and_Swap (CAS)​: an ​atomic ​instruction used in ​multithreading to achieve ​synchronization​.                      It compares the contents of a memory location to a given value and, only if they are the same, modifies                                        the contents of that memory location to a given new value. This is done as a single atomic operation. ​It                                        might suffer ​starvation​. On server­grade multi­processor architectures of the 2010s, compare­and­swap                      is relatively cheap relative to a simple load that is not served from cache. A 2013 paper points out that a                                          CAS is only 1.15 times more expensive that a non­cached load on Intel Xeon (​Westmere­EX​) and 1.35                                  times on AMD ​Opteron (Magny­Cours).​[6] As of 2013, most ​multiprocessor architectures support CAS in                            hardware. As of 2013, ​the compare­and­swap operation is the most popular ​synchronization primitive                          for implementing both lock­based and non­blocking​ ​concurrent data structures​.​[4]  
  5. 5.     ● Fetch_and_Add​: a special instruction that ​atomically modifies the contents of a memory location. It                            fetches ​(copies) the old value of the parameter, ​adds to the parameter the value given by the second                                    parameter and returns the old value. ​Maurice Herlihy (1991) proved that fetch­and­add has a finite                              consensus number, in contrast to the ​compare­and­swap operation. ​The fetch­and­add operation can                        solve the wait­free consensus problem for no more than two concurrent processes​.​[1]       
  6. 6.     Guidelines​:  ● do not use ​synchronized​ blocks if not strictly necessary  ● prefer ​atomic variables​ already implemented  ● when there is more than one variable to share, wrap critical sections with ​lock/unlock  ● synchronized ​only when there is a write operation    In Java exist three main mechanisms for ME: ​atomic variables, ​implicit locks​ and ​explicit locks​ (​reference​).    Condition Variables​: a threads’ synchronization mechanism. A thread can suspend its execution while waiting                            for a specific condition to happen. CVs do not guarantee ME on shared resources, so they have to be used in                                          conjunction with mutexes.  A thread can perform on a CV two operations:  ● wait​: ConditionVariable c; c.wait(); = a thread is suspended while waiting for a signal to resume it  ● signal​: c. signal(); = a thread exiting the CS signals to a waiting thread in the waiting queue to resume                                        its execution. The awaken thread must check again that the condition it was waiting for has actually                                  happened  ● signalAll​: c.signalAll(); = wakes up all the waiting threads.  These three methods must be invoked inside a critical section.   
  7. 7. Java Monitor​: in java, every object (Object) has three methods, namely ​wait​, ​notify and ​notifyAll​, performing                                the same as the three previous ones. So, it is possible to implement a mechanism completely similar to                                    condition variables by using these three methods. By dealing with Objects, in order to guarantee ME it is                                    necessary to use the ​synchronized​ keyword.  Java ReentrantLock ​(​oracle reference​): A reentrant mutual exclusion ​Lock with the same basic behavior and                              semantics as the implicit monitor lock accessed using synchronized methods and statements, but with                            extended capabilities. A ReentrantLock is ​owned by the thread last successfully locking, but not yet unlocking                                it. A thread invoking lock will return, successfully acquiring the lock, when the lock is not owned by another                                      thread. The method will return immediately if the current thread already owns the lock. The constructor for this                                    class accepts an optional ​fairness parameter. When set true, under contention, locks favor granting access to                                the longest­waiting thread. Otherwise this lock does not guarantee any particular access order. Programs                            using fair locks accessed by many threads may display lower overall ​throughput (i.e., are slower; often much                                  slower) than those using the default setting, but have smaller ​variances in times to obtain locks and guarantee                                    lack of starvation​.  It is recommended practice to ​always​ immediately follow a call to lock with a try block:    Since a CV must always be used together with a mutex, the ​ReentrantLock gives the method ​newCondition()                                  to obtain a CV, so that when performing a ​wait on this CV the lock used to create the same condition will be                                              released and the thread suspended.  The followings are the most relevant methods of ReentrantLock:  ● lock()  ● tryLock()​: Acquires the lock only if it is not held by another thread at the time of invocation.  ● unlock()  ● newCondition()  There is also a method that permit to specify a timeout to observe while trying to lock: ​tryLock (long timeout,                                        TimeUnit​ unit)    Barging​: usually locks are starvation free, because they use queues to put suspended threads that are FIFO,                                  so every single thread entering the queue will acquire the lock with a bounded time. There are such situation                                      when the JVM performs some performance optimization: since the suspend/resume mechanism requires                       
  8. 8. overhead and waste of time, when there is contention for the same lock the JVM might wake up a thread                                        already executing instead of a sleeping one. This process is called ​barging​.    Monitor ​(​oracle reference​): mutex + condition variable; a synchronization construct that allows ​threads to have                              both ​mutual exclusion and the ability to wait (block) for a certain condition to become true. Only one process at                                        a time is able to enter the monitor. In other words, a monitor is a mechanism that associate to a given data                                            structure a set of procedures/operations that are the only ones allowed. ​Each procedure/operation is mutually                              exclusive: only one process/thread can access the monitor at a time​.  Two variants of monitor exist:  ● Hoare’s m.​: blocking condition variable; ​signal­and­wait​: There are 3 kind of queues: ​enter (threads                            aiming to enter the monitor), ​urgent (threads who left the monitor lock to those ones that were already                                    waiting on the condition variable) and ​condition (a queue for each CV). When a thread executes a                                  signal, it auto­suspends, moves to the urgent queue and wakes up the first thread waiting in the waiting                                    queue of that variable. Threads waiting in the urgent queue will be awaken before the ones in the enter                                      queue.  ● MESA​: non­blocking CV; ​signal­and­continue​; a signal does not suspend the invoking thread, but                          makes a thread waiting on the waiting queue of that CV move to the entering queue. The signalling                                    thread leaves the monitor continuing with its execution.  Semaphore​: another mechanism for ME; in practice, it is a shared integer variable ​p>=0 ​on which increments                                  and decrements are  performed by means of ​two atomic operations​:  ● V ​(​verhoog ​= increment, means ​signal​): used when a thread is exiting a portion of code.  ● P​ (​prolaag​ = try, means ​wait​): used when a thread wants to enter a CS.  In practice, a semaphore allow ​n threads to “enter” a specific portion of code, just by setting up the initial value                                          of p to n.    Readers and Writers problem​: n threads want to read and/or write from/to a shared variable, namely a buffer.                                    ME is required since writes to the same memory must be synchronized.  In Java exist the ​ReentrantReadWriteLock class: maintains a pair of associated ​locks​, one for read­only                              operations and one for writing; also keeps the capabilities of a ReentrantLock; in particular, a lock downgrading                                  is possible: a thread holding the write lock might obtain a reentrant lock on the read lock.  Another solution to ME for the readers and writers problem is the synchronization of all the methods in a class                                        containing a shared ​collection​, so that the class actually wraps the shared variable and has ME guaranteed.  Concurrent collections​ (​oracle reference​): classes that contains thread­safe data structures. They can be:  ● blocking​: read and write operations (take and put) wait for the data structure to be non­empty or                                  non­full.  ● non­blocking  CopyOnWriteArrayList (​oracle reference​): A thread­safe variant of ​ArrayList in which all mutative operations                          (add, set, and so on) are implemented by making a fresh copy of the underlying array. Ordinarily too costly, but                                        may be ​more efficient than alternatives when traversal operations vastly outnumber mutations, and is useful                              when you cannot or don't want to synchronize traversals, yet need to preclude interference among concurrent                                threads.  Dequeue​: “​double end queue​”, a queue where an element can be put either from the head or the tail of the                                          queue. 
  9. 9. Work stealing​: suppose there are m producers and n consumers; each producer writes to a shared buffer and                                    each consumer has its own queue where the work flows from the buffer (according to a specific policy for                                      example). It might happen that a consumer is observing its queue growing because it is not able to consume                                      all the work. In this situation, any other consumer that might observe an empty queue, might “steal” work from                                      the queue of that other consumer.  JVM Memory​: is organized in three main areas:  ● method​: for each class, methods’ code, attributes and field values are stored  ● heap​: all the instances are stored here  ● thread stack​: for each thread, a data structure is placed here, containing methods’ stack, PC register,                                etc...      Tasks and Executors​: the way to manage threads execution, in terms of starting, stopping and resuming a                                  thread    Executor​: An object that executes submitted ​Runnable tasks. This interface provides a way of decoupling task                                submission from the mechanics of how each task will be run, including details of thread use, scheduling, etc.                                    An Executor is normally used instead of explicitly creating threads.  void ​execute​(​Runnable​ command)    Runnable​: implemented by any class whose instances are intended to be executed by a thread. The class                                  must define a ​method of no arguments called ​run​.    Callable​: A task that ​returns a result and may throw an exception​. Implementors define a ​single method with                                    no arguments called call​. The Callable interface is similar to ​Runnable​, in that both are designed for classes                                   
  10. 10. whose instances are potentially executed by another thread. A Runnable, however, does not return a result                                and cannot throw a ​checked exception​.  Future​: represents the result of an asynchronous computation. Methods are provided to check if the                              computation is complete, to wait for its completion, and to retrieve the result of the computation. The result can                                      only be retrieved using method get when the computation has completed, blocking if necessary until it is ready.                                    Cancellation is performed by the cancel method. Additional methods are provided to determine if the task                                completed normally or was cancelled. Once a computation has completed, the computation cannot be                            cancelled. If you would like to use a Future for the sake of cancellability but not provide a usable result, you                                          can declare types of the form Future<?> and return null as a result of the underlying task.   boolean ​cancel​(boolean mayInterruptIfRunning)  boolean ​isCancelled​()  boolean ​isDone​()  V​ ​get​() throws​ ​InterruptedException​, ​ExecutionException  V​ ​get​(long timeout, ​TimeUnit​ unit) throws​ ​InterruptedException​, ​ExecutionException​, ​TimeoutException      ExecutorService​: An ​Executor that provides methods to manage termination and methods that can produce a                              Future for tracking progress of one or more asynchronous tasks. An ExecutorService can be shut down, which                                  will cause it to reject new tasks. Two different methods are provided for shutting down an ExecutorService. The                                    shutdown() method will allow previously submitted tasks to execute before terminating, while the                          shutdownNow() method prevents waiting tasks from starting and attempts to stop currently executing tasks.                            Upon termination, an executor has no tasks actively executing, no tasks awaiting execution, and no new tasks                                  can be submitted.  void ​shutdown​()  List​<​Runnable​> ​shutdownNow​()  <T> ​Future​<T> ​submit​(​Callable​<T> task)  Future​<?> ​submit​(​Runnable​ task)    ThreadPoolExecutor​: An ​ExecutorService that executes each submitted task using one of possibly several                          pooled threads, normally configured using​ ​Executors​ factory methods.  Thread pools address two different problems: they usually provide improved performance when executing                          large numbers of asynchronous tasks, due to reduced per­task invocation overhead, and they provide a means                                of bounding and managing the resources, including threads, consumed when executing a collection of tasks.                              Each ThreadPoolExecutor also maintains some basic statistics, such as the number of completed tasks.    ScheduledThreadPoolExecutor​: A ​ThreadPoolExecutor that can additionally schedule commands to run                    after a given delay, or to execute periodically.  public ​ScheduledThreadPoolExecutor​(int corePoolSize)  public ​ScheduledFuture​<?> ​schedule​(​Runnable​ command, long delay, ​TimeUnit     unit)  public <V> ​ScheduledFuture​<V> ​schedule​(​Callable​<V> callable, long delay,    ​TimeUnit​ unit)  public ​ScheduledFuture​<?> ​scheduleAtFixedRate​(​Runnable​ command, 
  11. 11. long initialDelay, long period, ​TimeUnit​ unit)  public ​ScheduledFuture​<?> ​scheduleWithFixedDelay​(​Runnable​ command,  long initialDelay, long delay, ​TimeUnit​ unit)  public void ​execute​(​Runnable​ command)  public ​Future​<?> ​submit​(​Runnable​ task)  public <T> ​Future​<T> ​submit​(​Callable​<T> task)  public void ​shutdown​()  public ​List​<​Runnable​> shutdownNow()    Task lifecycle:      Executor lifecycle​:    Number of threads in the pool​:    N    )Npool =   CPU ∙ U ∙ ( T COMPUTE T   + T WAIT COMPUTE + 1    where U is the CPUs and +1 is for safety reasons: it might happen that all the necessary threads block and no                                            more threads then could enter the pool. 
  12. 12.   Deadlock with nested monitors​: suppose two nested monitors, where a thread first waits on a CV and then                                    enters its monitor and waits on a nested CV. Now, if another thread, a second one, wants to enter the first                                          monitor, it might happen that: the first thread signalled on the outer CV and signal got lost, or the inner thread                                          still has to signal on the outer CV. In any case, the thread waiting outside will never be able to enter the                                            monitor because the inner thread will always be blocked on the internal CV that will never receive any signal.                                      So, this situation leads to a ​deadlock​.    Deadlock with executor and monitor (Thread Starving Deadlock)​: suppose to have an Executor with a                              thread pool whose size is N. Suppose that all the threads encounter the ​same CV and they ​all wait on it​. If                                            there is no other thread available in the pool, then no one will be able to wake up the waiting threads, thus                                            resulting in a deadlock.    Wait­for­graph​: a ​directed graph used for ​deadlock detection in ​operating systems and ​relational database                            systems.    See ​Coffman​.    Memory barrier​: a type of ​barrier ​instruction that causes a ​central processing unit (CPU) or ​compiler to                                  enforce an ​ordering constraint on ​memory operations issued before and after the barrier instruction; necessary                              because most modern CPUs employ performance optimizations that can result in​ ​out­of­order execution​.    Java Memory Model​ (See Java Memory Model paper):  ● Program order rule​: actions in a thread are performed in their coding order.  ● Monitor lock rule​: an unlock to a monitor ​happens­before​ any subsequent lock on the same.  ● Volatile variable rule​: a write to a v. variable ​happens­before any subsequent read to the same var.;                                  the same holds for ​atomic var.s​.  ● Thread start rule​: a call to ​Thread.start​ ​happens­before​ any action in the started thread.  ● Thread termination rule​: any action in a thread ​happens­before any other thread detects that it has                                terminated​, either by a successfully return from ​Thread.join​ or ​Thread.isAlive​ returning false.  ● Interruption rule​: a call to ​interrupt for a thread ​happens­before the interrupted thread detects the                              interrupt.  ● Finalizer rule​: the end of a constructor for an object ​happens­before​ the start of its finalizer.  ● Transitivity​: if ​A happens­before B​, ​B happens­before C​, then ​A happens­before C​.  Performance  With Java, in order to perform accurate performance measurements it is important to exactly know how the                                  code is run by the JVM. An important component is the ​Just In Time (JIT) compilation engine: this module                                      performs ​dynamic compilation during the execution of a program – at ​run time – rather than prior to                                    execution.​[1] Most often this consists of translation to ​machine code​, which is then executed directly, but can                                  also refer to translation to another format. It allows ​adaptive optimization such as ​dynamic recompilation – thus                                  in theory JIT compilation can yield faster execution than static compilation. Interpretation and JIT compilation                             
  13. 13. are particularly suited for ​dynamic programming languages​, as the runtime system can handle late­bound data                              types and enforce security guarantees.  Evaluating a java source code performance requires collecting statistics about its execution times. In particular,                              when interested in evaluating a multi­thread software, it is important to have such a way to synchronize the                                    threads starting execution time. In other words, all the threads have to start at the same time so that the                                        measurements are faithful to reality. Java supplies a specific class for this task.  CountDownLatch ​(​oracle reference​): a synchronization aid that allows one or more threads to wait until a set                                  of operations being performed in other threads completes.  A CountDownLatch is initialized with a given ​count​. The ​await methods block until the current count reaches                                  zero due to invocations of the ​countDown() method, after which all waiting threads are released and any                                  subsequent invocations of ​await return immediately. This is a one­shot phenomenon ­­ the count cannot be                                reset. If you need a version that resets the count, consider using a​ ​CyclicBarrier​.  A CountDownLatch is a versatile synchronization tool and can be used for a number of purposes. A                                  CountDownLatch initialized with a count of one serves as a simple on/off latch, or gate: all threads invoking                                    await wait at the gate until it is opened by a thread invoking ​countDown()​. A CountDownLatch initialized to ​N                                      can be used to make one thread wait until ​N threads have completed some action, or some action has been                                        completed N times.  public ​CountDownLatch​(int count)  public void ​await​() throws​ ​InterruptedException  public void ​countDown​()    CyclicBarrier (​oracle reference​): a synchronization aid that allows a set of threads to all wait for each other to                                      reach a common barrier point. CyclicBarriers are useful in programs involving a ​fixed sized party of threads                                  that must occasionally wait for each other​. The barrier is called ​cyclic because it can be re­used after the                                      waiting threads are released.  public ​CyclicBarrier​(int parties, ​Runnable​ barrierAction)  public int ​await​() throws​ ​InterruptedException​, ​BrokenBarrierException    Performance in synchronization  Synchronization impacts performance because of:  ● context switches  ● memory synchronization  ● thread synchronization  Considering our code, one of its performance indexes is throughput, in terms of number threads entering and                                  leaving it in a given time. So, here comes to us ​Little’s Law​: where, in our case, L represents the                            λ w L =                   number of threads in our system, i.e. the number of threads waiting to execute our portion of code (waiting                                      because of synchronization of course), λ is the arrival rate and w the delay of the system. What we want to do                                            is ​minimize ​L.  Possible solutions:  ● CS shrinking​: reduce the size of the CS so that a thread do not spend too much time in it.  ● CS splitting​: split a CS in smaller ones, so perform ​lock splitting​.  ● JVM optimizations​: 
  14. 14. ○ Lock coarsening​: whenever the JVM observes that a thread moves from a waiting queue to                              another one (because of a locks chain) always in the same sequential order, then it collapse all                                  the CSs into a single one, thus merging the locks into a single one. This leads to performance                                    improvements in terms of waiting time, so improving the throughput.  ○ Lock elision​: if a CS is ​always executed by the ​same thread​, then it is not necessary to protect                                      it with a lock: so the JVM removes it.  ● Lock granularity​: when the shared data structure is wide, it is important to put locks only where                                  needed, and not on the entire data structure. For example, an hashMap could be so wide that threads                                    concurrently would access it on different parts, thus not violating any ME constraint. So, locking the                                entire table would be penalizing the performance of our software. A ​solution is to have more locks to                                    distribute on the overall data structure. Suppose: , then              number of Locks and  N  table size NL  =   =       . N​L​ should be dimensioned so that the conflict probability is minimized.ocks distribution  N % N   L =   L   ● Non­blocking algorithms​: if failure or ​suspension of any ​thread cannot cause failure or suspension of                              another thread​[1]​  (use of ​volatile​ and ​atomic variables​).  ○ Wait­free​: the strongest non­blocking guarantee of progress, combining guaranteed                  system­wide throughput with ​starvation​­freedom. ​An algorithm is wait­free if every operation has                        a bound on the number of steps the algorithm will take before the operation completes​.​[11]   ○ Lock­free​: allows individual threads to starve but guarantees system­wide throughput. An                      algorithm is ​lock­free if it satisfies that when the program threads are run sufficiently long at                                least one of the threads makes progress (for some sensible definition of progress). All wait­free                              algorithms are lock­free. ​An algorithm is lock free if every operation has a bound on the number                                  of steps before one of the threads operating on a data structure completes its operation.​[11]   ○ Obstruction­free​: the weakest natural non­blocking progress guarantee. ​An algorithm is                    obstruction­free if at any point, a single thread executed in isolation ​(i.e., with all obstructing                              threads suspended) ​for a bounded number of steps will complete its operation.​[11] All lock­free                            algorithms are obstruction­free.  See ​Treiber's stack and ​Michael and Scott's queue​. These two algorithms are a non­blocking implementation                              of a stack and a queue respectively. Both are based on the use of ​AtomicReference variables, that make it                                      possible to deal with threads synchronization. They result with better performance when used in such                              programs that suffer from ​high contention rates​. Basically, they implement a spinning solution to deal with                                concurrent modification on the stack pointer and the ​head and ​tail references in the queue. Thus, whenever a                                       thread tries to modify one of them has to be sure (see the if statements) that it is actually modifying what it is                                              expecting to modify (note the use of ​compareAndSet​). If such a check fails, then go back (spin) and try again                                        from the beginning.    JAVA: Nested Classes (​oracle reference​)  A nested class is a member of its enclosing class. Non­static nested classes (inner classes) have access to                                    other members of the enclosing class, even if they are declared private. Static nested classes do not have                                    access to other members of the enclosing class. As a member of the OuterClass, a nested class can be                                      declared private, public, protected, or ​package private​. (Recall that outer classes can only be declared public                                or ​package private​.) 
  15. 15. Static nested class​: A static nested class ​interacts with the instance members of its outer class (and other                                    classes) ​just like any other top­level class​. In effect, a static nested class is ​behaviorally a top­level class that                                      has been ​nested​ in another top­level class ​for packaging convenience​.     Inner class​: is ​associated with an instance of its enclosing class and ​has direct access to that object's                                    methods and fields​. Also, because an inner class is associated with an instance, it cannot define any static                                    members itself. Objects that are instances of an inner class ​exist ​within​ an instance of the outer class​.    To instantiate an inner class, you must first instantiate the outer class. Then, create the inner object within the                                      outer object with this syntax:  OuterClass.InnerClass innerObject = outerObject.new InnerClass();    Local classes (​oracle reference​): are classes that are defined in a ​block​, which is a group of zero or more                                        statements between balanced braces. You typically find local classes defined in the body of a method. ​A local                                    class has access to the members of its enclosing class and to local variables​. However, a local class can only                                        access local variables that are declared ​final​. Starting in Java SE 8, a local class can access local variables                                      and parameters of the enclosing block that are final or ​effectively final ​(i.e. never changed in the enclosing                                    block).  Shadowing​: If a declaration of a type (such as a member variable or a parameter name) in a particular scope                                        (such as an inner class or a method definition) has the same name as another declaration in the enclosing                                      scope, then the declaration ​shadows​ the declaration of the enclosing scope.  Anonymous classes​: enable you to declare and instantiate a class at the same time. They are like local                                    classes except that they do not have a name. Use them if you need to use a local class only once. Ex.: when                                              making a new of an interface and then opening { to write “on the fly” the class.  AtomicInteger (​oracle reference​): An int value that may be updated atomically. Methods: getAndAdd(int                          delta), getAndIncrement(), addAndGet(int delta), incrementAndGet().  ThreadLocal<T> (​oracle reference​): This class provides ​thread­local variables​. These variables differ from                        their normal counterparts in that each thread that accesses one (via its get or set method) has its own,                                      independently initialized copy of the variable. ThreadLocal instances are typically private static fields in classes                              that wish to associate state with a thread (e.g., a user ID or Transaction ID).       MESSAGE PASSING MODEL  Message passing between a pair of processes can be supported by two message communication operations,                              send and ​receive​, defined in terms of destinations and messages. A ​queue ​is associated with each ​message                                  destination​. Sending processes cause messages to be added to remote queues and receiving processes                            remove messages from local queues. Sending and receiving processes may be either ​synchronous or                            asynchronous​. In the synchronous form of communication, the sending and receiving processes synchronize                          at every message​. In this case, both send and receive are ​blocking operations​.  In the asynchronous form of communication, ​the use of the send operation is nonblocking in that the sending                                    process is allowed to proceed as soon as the message has been copied to a local buffer, and the transmission                                        of the message proceeds in parallel with the sending process​. The ​receive operation can have ​blocking and                                 
  16. 16. non­blocking variants​. In the non­blocking variant, the receiving process proceeds with its program after                            issuing a receive operation, which provides a buffer to be filled in the background, but it must separately                                    receive notification that its buffer has been filled, by polling or interrupt.  Non­blocking communication appears to be more efficient, but it involves extra complexity in the receiving                              process associated with the need to acquire the incoming message out of its flow of control. For these                                    reasons, today’s systems do not generally provide the nonblocking form of receive.  Communication channel hypothesis​:  ● Reliability​: in terms of validity and integrity. As far as the validity property is concerned, a point­to­point                                  message service can be described as reliable if messages are guaranteed to be delivered despite a                                ‘reasonable’ number of packets being dropped or lost. In contrast, a point­to­point message service can                              be described as unreliable if messages are not guaranteed to be delivered in the face of even a single                                      packet dropped or lost. For integrity, messages must arrive uncorrupted and without duplication.  ● Ordering​: Some applications require that messages be delivered in sender order – that is, the order in                                  which they were transmitted by the sender. The delivery of messages out of sender order is regarded                                  as a failure by such applications.  ● QoS  ● Queue policy    Processes addressing​: in the Internet protocols, messages are sent to ​(Internet address, local port) pairs. A                                local port is a message destination within a computer, specified as an integer. A port has exactly one receiver                                      (multicast ports are an exception) but can have many senders. An alternative to address processes is to use                                    the values ​(Process ID, local port)​.    Distributed systems models​:  ● Physical​: focus on hardware organization.  ● Architectural​ (behavioural): how the different components interact (client­server, peer­to­peer).  ● Fundamental ​(abstract): an high level abstraction that makes possible to represent mathematically the                          real model, so to perform hypothesis validation    Bounded­buffer with asynchronous message passing and Dijkstra’s guarded commands​: the most                      important element of the guarded command language. In a guarded command, just as the name says, the                                  command is "guarded". The guard is a ​proposition​, which must be true before the statement is ​executed​. At the                                      start of that statement's execution, one may assume the guard to be true. Also, if the guard is false, the                                        statement will not be executed. The use of guarded commands makes it easier to prove the ​program meets the                                      specification​. The statement is often another guarded command.  A guard can be in one of these three states:  ● failed​: condition is ​false  ● valid​: condition ​true​, result received  ● delayed​: condition ​true​, but ​no results available yet  See notes from Professor about the algorithm for the bounded­buffer (readers­writers) problem solved with                            guarded commands.   
  17. 17. Message Passing Interface (MPI, ​a reference​)​: the first standardized, vendor independent, message passing                          library. The advantages of developing message passing software using MPI closely match the design goals of                                portability, efficiency, and flexibility. MPI is not an IEEE or ISO standard, but has in fact, become the "industry                                      standard" for writing message passing programs on HPC platforms.   MPI primarily addresses the ​message­passing parallel programming model​: data is moved from the address                            space of one process to that of another process through ​cooperative operations on each process​.  MPI main components and features:  ● Communicators and Groups​: MPI uses objects called communicators and groups to define ​which                          collection of processes may communicate with each other​. ​MPI_COMM_WORLD is the predefined                        communicator that includes all of your MPI processes.  ○ A ​group ​is ​an ordered set of processes​. Each process in a group is associated with a unique                                    integer rank. Rank values start at zero and go to N­1, where N is the number of processes in the                                        group. In MPI, a group is represented within system memory as an object. It is accessible to the                                    programmer only by a "handle". A group is always associated with a communicator object.   ○ A ​communicator ​encompasses a group of processes that may communicate with each other.                          All MPI messages must specify a communicator. In the simplest sense, the communicator is an                              extra "tag" that must be included with MPI calls.  ○ From the programmer's perspective, a group and a communicator are one.  ● Rank​: ​Within a communicator​, every ​process has its ​own unique, integer identifier assigned by the                              system when the process initializes. A rank is sometimes also called a "task ID". ​Ranks are contiguous                                  and begin at zero​. It is used by the programmer to specify the source and destination of messages.                                    Often used conditionally by the application to control program execution (if rank=0 do this / if rank=1 do                                    that).   ● MPI buffer​: since send and receive are rarely ​perfectly synchronized, the MPI architecture presents a                              library buffer​ that is used to store transiting messages while the receiver cannot receive them.  ● Blocking and Nonblocking op.s​: Blocking is interpreted as ‘​blocked until it is safe to return​’, in the                                  sense that ​application data has been copied into the MPI system and hence is in transit or delivered                                    and therefore the application buffer can be reused​. ​Safe means that modifications will not affect the                                data intended for the receive task​. Safe does not imply that the data was actually received ­ it may very                                        well be sitting in a system buffer. Non­blocking send and receive routines behave similarly ­ they will                                  return almost immediately. They do not wait for any communication events to complete, such as                              message copying from user memory to system buffer space or the actual arrival of message.   ● Point­to­Point Communication​: message passing between two, and only two, different MPI tasks.                        One task is performing a send operation and the other task is performing a matching receive operation.  ● Collective Communication​: involve ​all processes within the scope of a communicator. It is the                            programmer's responsibility to ensure that all processes within a communicator participate in any                          collective operations. Unexpected behavior, including program failure, can occur if even one task in the                              communicator doesn't participate.  ● Types of Collective Operations​: 
  18. 18.     ● Synchronization ­ processes wait until all members of the group have reached the                          synchronization point.  ● Data Movement​ ­ broadcast, scatter/gather, all to all.  ● Collective Computation (reductions) ­ one member of the group collects data from the other                            members and performs an operation (min, max, add, multiply, etc.) on that data.    ● Order​:  ○ MPI guarantees that messages will not overtake each other.   ○ Order rules do not apply if there are multiple threads participating in the communication                            operations.  ● Fairness​: MPI does not guarantee fairness ­ it's up to the programmer to prevent "operation                              starvation".  ● Envelope​: source+destination+tag+communicator (see later)      The underlying architectural model for MPI is relatively simple and captured in Figure 4.17; note the added                                  dimension of ​explicitly having MPI library buffers in both the sender and the receiver​, managed by the MPI                                    library and used to hold data in transit.   
  19. 19.   MPI point­to­point routines:    Buffer ​= Program (application) address space that references the data that is to be sent or received.  Count ​= Indicates the number of data elements of a particular type to be sent.  Type ​= For reasons of portability, MPI predefines its elementary data types.  Dest ​= The rank of the receiving process.  Source = The rank of the sending process. This may be set to the wild card MPI_ANY_SOURCE to receive a                                        message from any task.  Tag = Arbitrary non­negative integer assigned by the programmer to uniquely identify a message. Send and                                receive operations should match message tags. For a receive operation, the wild card MPI_ANY_TAG can be                                used to receive any message regardless of its tag.  Communicator = communication context, or set of processes for which the source or destination fields are                                valid.  Status​ = For a receive operation, indicates the source of the message and the tag of the message.  Request​ = Used by non­blocking send and receive operations. The system issues a unique “request number”.   
  20. 20. TIME AND GLOBAL STATES  Clocks, events and process states  We define an event to be the occurrence of a single action that a process carries out as it executes – a                                            communication action or a state­transforming action. The sequence of events within a single process p​i can be                                  placed in a single, total ordering, which we denote by the relation ➝​i between the events. That is, ​e ➝​i e' if and                                              only if the event e occurs before e’ at p​i​. This ordering is well defined, whether or not the process is                                          multithreaded, since we have assumed that the process executes on a single processor. Now we can define                                  the history of process p​i to be the series of events that take place within it, ordered as we have described by                                            the relation ➝​i​:  history(p​i​) h​i​ = < e​i​ 0​  ei​1​  e​2​  … >    The operating system reads the node’s hardware clock value, ​H​i​(t) , scales it and adds an offset so as to                                        produce ​a software clock ​C​i​(t) = ​α H​i​(t) + β ​that approximately measures real, physical time t for process p​i . In                                            other words, when the real time in an absolute frame of reference is t, ​C​i​(t)​ is the reading on the software clock.  Successive events will correspond to different timestamps only if the clock resolution – the period between                                updates of the clock value – is smaller than the time interval between successive events.    SKEW ​between computer clocks in a distributed system Network: The instantaneous difference between the                            readings of any two clocks is called their skew.    Clock ​DRIFT​: which means that they count time at different rates, and so diverge.    Clock's ​DRIFT RATE​: change in the offset between the clock and a nominal perfect reference clock per unit of                                      time measured by the reference clock.    UTC (Coordinated Universal Time): is based on atomic time, but a so­called ‘leap second’ is inserted – or,                                    more rarely, deleted – occasionally to keep it in step with astronomical time. UTC signals are synchronized and                                    broadcast regularly from landbased radio stations and satellites covering many parts of the world.    Synchronizing Physical Clocks  Synchronization in a synchronous system​: In general, for a synchronous system, the optimum bound that                              can be achieved on clock skew when synchronizing N clocks is u * (1 – 1 / N) [Lundelius and Lynch 1984], u =                                                max ­ min, the max and min time that a transmission of a message can observe in a synchronous system.    Cristian’s method for synchronizing clocks​: use of a time server, connected to a device that receives                                signals from a source of UTC, to synchronize computers externally. There is no upper bound on message                                  transmission delays in an asynchronous system, the round­trip times for messages exchanged between pairs                            of processes are often reasonably short – a small fraction of a second. He describes the algorithm as                                    probabilistic: the method achieves synchronization only if the observed round­trip times between client and                            server are sufficiently short compared with the required accuracy. A simple estimate of the time to which p                                   
  21. 21. should set its clock is t + T​round / 2 , which assumes that the elapsed time is split equally before and after S                                                placed t in mt(=message for timestamp sinc.). This is normally a reasonably accurate assumption, unless the                                two messages are transmitted over different networks.    Discussion​: Cristian’s method suffers from the problem associated with all services implemented by a single                              server: that the single time server might fail and thus render synchronization temporarily impossible. Cristian                              suggested, for this reason, that time should be provided by a group of synchronized time servers, each with a                                      receiver for UTC time signals. Dolev et al. [1986] showed that if f is the number of faulty clocks out of a total of                                                N, then we must have N = 3f if the other, correct, clocks are still to be able to achieve agreement.    Berkeley's Algorithm​: an algorithm for internal synchronization developed for collections of computers                        running Berkeley UNIX. A coordinator computer is chosen to act as the master. Unlike in Cristian’s protocol,                                  this computer periodically polls the other computers whose clocks are to be synchronized, called slaves. The                                slaves send back their clock values to it. The master estimates their local clock times by observing the                                    round­trip times (similarly to Cristian’s technique), and it averages the values obtained (including its own                              clock’s reading). The balance of probabilities is that this average cancels out the individual clocks’ tendencies                                to run fast or slow. The accuracy of the protocol depends upon a nominal maximum round­trip time between                                    the master and the slaves.  The master takes a FAULT­TOLERANT AVERAGE. That is, a subset is chosen of clocks that do not differ                                    from one another by more than a specified amount, and the average is taken of readings from only these                                      clocks.    NTP​: Cristian’s method and the Berkeley algorithm are intended primarily for use within  intranets. NTP’s chief design aims and features are as follows:  ● To provide a service enabling clients across the Internet to be synchronized accurately to UTC​:                              Although large and variable message delays are encountered in Internet communication, NTP employs                          statistical techniques for the filtering of timing data and it ​discriminates between the quality of timing                                data from different servers​.  ● To provide a reliable service that can survive lengthy losses of connectivity​: There are ​redundant                              servers and redundant paths ​between the servers​. The servers can reconfigure so as to continue to                                provide the service if one of them becomes unreachable.  ● To enable clients to resynchronize sufficiently frequently to offset the rates of drift found in most                                computers​: The service is designed to ​scale to large numbers of clients and servers​.  ● To provide protection against interference with the time service, whether malicious or accidental​: The                            time service uses ​authentication techniques to check that timing data originate from the ​claimed trusted                              sources​. It also validates the return addresses of messages sent to it.    Logical Time and Logical Clocks  As Lamport [1978] pointed out, since we cannot synchronize clocks perfectly across a distributed system, we                                cannot in general use physical time to find out the order of any arbitrary pair of events occurring within it. Two                                          simple and intuitively obvious points: 
  22. 22. ● If two events occurred at the same process p​i (i = 1, 2,...N), then they occurred in the order in which p​i                                            observes them – this is the order ​➝​i​ ​ that we defined above.  ● Whenever a message is sent between processes, the event of sending the message occurred before                              the event of receiving the message.    Lamport called the ​partial ordering obtained by generalizing these two relationships the ​happened­before                          relation. It is also sometimes known as the ​relation of causal ordering​ or ​potential causal ordering​.  The sequence of events need not be unique.    For example, a ​↛e and e ​↛a, since they occur at different processes, and there is no chain of messages                                            intervening between them. We say that events such as a and e that are not ordered by ​➝ are concurrent and                                          write this a || e .    Logical clocks​: Lamport [1978] invented a simple mechanism by which the ​happened­before ​ordering can be                              captured numerically, called a logical clock. A ​Lamport logical clock is a monotonically increasing software                              counter​, whose value need bear ​no particular relationship to any physical clock​. Each process ​p​i keeps its own                                    logical clock, ​L​i , which it uses to apply so­called Lamport timestamps to events. We denote the timestamp of                                      event ​e​ at ​p​i​ by ​L​i​(e)​ , and by ​L(e)​ we denote the timestamp of event ​e ​at whatever process it occurred at.  To capture the happened­before relation ​➝​, processes update their logical clocks and transmit the values of                                their logical clocks in messages as follows:  ● LC1​: ​L​i​ is incremented before each event is issued at process ​p​i​ : L​i​ := L​i​ + 1.  ● LC2​:   (a) When a process ​p​i​ sends a message ​m​, it piggybacks on ​m​ the value ​t = L​i​.  (b) On receiving ​(m, t)​, a process ​p​j computes ​L​j := max(​L​j ,​t​) and then applies LC1 before timestamping                                      the event receive(m).   
  23. 23.   Note: e​➝​ e’ ⇒ L(e) < L(e’) .  The converse is not true. If L(e) < L(e’) , then we cannot infer that e ​➝​ e’.    Totally ordered logical clocks​: Some pairs of distinct events, generated by different processes, have                            numerically identical Lamport timestamps. We can create a ​total order on the set of events – that is, one for                                        which all pairs of distinct events are ordered – by ​taking into account the identifiers of the processes at which                                        events occur.  We define the ​global logical timestamps for these events to be ​(T​i​, i) and ​(T​j​, j) , respectively. ​(T​i​, i) < (T​j​, j) if                                                and only if either ​T​i​ < T​j​  , or T​i​ = T​j​ and i < j​ .    Vector clocks​: Mattern [1989] and Fidge [1991] developed vector clocks to overcome the shortcoming of                              Lamport’s clocks: the fact that from L(e) < L(e’)  we cannot conclude that e ​➝​ e’.  A vector clock for a system of N processes is an array of N integers. Each process keeps its own vector clock,                                            Vi , which it uses to timestamp local events. There are simple rules for updating the clocks:    For a vector clock V​i​, V​i​(i) is the number of events that p​i has timestamped, and V​i​(j) (j ≠ i) is the number of                                                events that have occurred at p​j that have ​potentially ​affected p​i​. (Process p​j may have timestamped more                                  events by this point, but no information has flowed to pi about them in messages as yet.) 
  24. 24.   Figure 14.7 shows the vector timestamps of the events of Figure 14.5. It can be seen, for example, that V(a) <                                          V(f) , which reflects the fact that a​➝ f. Similarly, we can tell when two events are concurrent by comparing their                                          timestamps. For example, that c || e can be seen from the facts that neither V(c) ≤ V(e) nor V(e) ≤ V(c).  Vector timestamps have the ​disadvantage​, compared with Lamport timestamps, of taking up an amount of                              storage and message payload that is proportional to N, the number of processes.    Global states​: the problem of finding out whether a particular property is true of a distributed system as it                                      executes.  ● Distributed garbage collection​: An object is considered to be garbage if there are no longer any                                references to it anywhere in the distributed system. To check that an object is garbage, we must verify                                    that there are no references to it anywhere in the system. When we consider properties of a system, we                                      must include the state of communication channels as well as the state of the processes.  ● Distributed deadlock detection​: A distributed deadlock occurs when each of a collection of processes                            waits for another process to send it a message, and where there is a cycle in the graph of this                                        ‘​waits­for​’ relationship.  ● Distributed termination detection​: The phenomena of termination and deadlock are similar in some                          ways, but they are different problems. First, a deadlock may affect only a subset of the processes in a                                      system, whereas all processes must have terminated. Second, process ​passivity is not the same as                              waiting in a deadlock cycle: a deadlocked process is attempting to perform a further action, for which                                  another process waits; a passive process is not engaged in any activity.  ● Distributed debugging​: Distributed systems are complex to debug.   
  25. 25.   Global states and consistent cuts​: The essential problem is the absence of global time.      A global state corresponds to initial prefixes of the individual process histories. A cut of the system’s execution                                    is a subset of its global history that is a union of prefixes of process histories:   
  26. 26.   The leftmost cut is inconsistent. This is because at p​2 it includes the receipt of the message m​1​, but at p​1 it                                            does not include the sending of that message. This is showing an ‘effect’ without a ‘cause’. The actual                                    execution never was in a global state corresponding to the process states at that frontier, and we can in                                      principle tell this by examining the ➝ relation between events. By contrast, the rightmost cut is consistent.      INTERPROCESS COMMUNICATION  Direct comm. (direct naming)​: unique names are given to all processes comprising a program  ● symmetrical ​direct naming: ​both the sender and receiver name​ the corresponding ​process​. 
  27. 27. ● asymmetrical​ direct naming: the receiver can receive messages from any process.  Indirect comm.​ ​(indirect naming)​: uses intermediaries called ​channels​ or ​mailboxes  ● symmetrical​ indirect naming: ​both the sender and receiver name​ the corresponding ​channel​.  ● asymmetrical​ indirect naming: the receiver can receive messages from any channel.    Request/Reply protocol​: a requestor sends a request message to a replier system which receives and                              processes the request, ultimately returning a message in response. This is a simple, but powerful messaging                                pattern which allows two applications to have a two­way conversation with one another over a channel. This                                  pattern is especially common in client­server architectures.​[1]  For simplicity, this pattern is typically implemented in a purely ​synchronous fashion, as in ​web service calls                                  over ​HTTP​, which holds a connection open and waits until the response is delivered or the ​timeout period                                    expires. However, request–response may also be implemented ​asynchronously​, with a response being                        returned at some unknown later time. This is often referred to as "sync over async", or "sync/async", and is                                      common in ​enterprise application integration (EAI) implementations where slow ​aggregations​, time­intensive                      functions, or​ ​human workflow​ must be performed before a response can be constructed and delivered.            Marshalling and Unmarshalling​: The information stored in running programs is represented as data                          structures, whereas the ​information in messages consists of sequences of bytes​. Irrespective of the form of                                communication used, the ​data structures must be flattened (converted to a sequence of bytes) ​before                             
  28. 28. transmission and rebuilt on arrival​. There are differences in data representation from a computer to another                                one. So, when communicating, the following problems must be addressed:  ● primitive data representation (such as integers and floating­point numbers).  ● set of codes used to represent characters (ASCII  or Unicode).    There are two ways for enabling any two computers to exchange binary data values:  ● The values are converted to an agreed external format before transmission and converted to the local                                form on receipt.  ● The values are transmitted in the sender’s format, together with an indication of the format used, and                                  the recipient converts the values if necessary.  An agreed standard for the representation of data structures and primitive values is called an ​external data                                  representation​.  Marshalling is the process of taking a collection of data items and assembling them into a form suitable for                                      transmission in a message. ​Unmarshalling is the process of disassembling them on arrival to produce an                                equivalent collection of data items at the destination. Thus marshalling consists of the translation of structured                                data items and primitive values into an external data representation. Similarly, unmarshalling consists of the                              generation of primitive values from their ​external data representation​ and the rebuilding of the data structures.  Three alternative approaches to external data representation and marshalling:  ● CORBA​’s common data representation, which is concerned with an external representation for the                          structured and primitive types that can be passed as the arguments and results of remote method                                invocations in CORBA.  ● Java’s object serialization​, which is concerned with the flattening and external data representation of                            any single object or tree of objects that may need to be transmitted in a message or stored on a disk.  ● XML​ (Extensible Markup Language), which defines a textual format for representing structured data.  In the first two approaches, the primitive data types are marshalled into a binary form. In the third approach                                      (XML), the primitive data types are represented textually. The textual representation of a data value will                                generally be longer than the equivalent binary representation. The HTTP protocol is another example of the                                textual approach.  Two main issues exist in marshalling:  ● compactness​: the resulting message should be as compact as possible.  ● data type inclusion​: CORBA’s representation includes just the values of the objects transmitted; Java                            serialization and XML does include type information.  Two other techniques for external data representation are worthy of mention:  ● Google uses an approach called ​protocol buffers to capture representations of both stored and                            transmitted data.  ● JSON (JavaScript Object Notation)   Both these last two methods represent a step towards more lightweight approaches to data representation                              (when compared, for example, to XML).  Particular attention is to be paid to ​remote object references​. A remote object reference is an identifier for a                                      remote object that is valid throughout a distributed system. A remote object reference is passed in the                                  invocation message to specify which object is to be invoked. Remote object references must be generated in a                                    manner that ensures ​uniqueness over space and time​. Also, object references must be unique among all of the                                    processes in the various computers in a distributed system. One way is to construct a remote object reference                                   
  29. 29. by concatenating the Internet address of its host computer and the ​port number of the process that created it                                      with the ​time of its creation ​and a ​local object number​. The local object number is incremented each time an                                        object is created in that process.    The last field of the remote object reference shown in Figure 4.13 contains some information about the                                  interface of the remote object, for example, the interface name. This information is relevant to any process that                                    receives a remote object reference as an argument or as the result of a remote invocation, because it needs to                                        know about the methods offered by the remote object.    Idempotent op.s​: an operation that will produce the same results if executed once or multiple times.​[7] In the                                    case of ​methods or ​subroutine calls with ​side effects​, for instance, it means that the modified state remains the                                      same after the first call. In ​functional programming​, though, an idempotent function is one that has the property                                    f​(​f​(​x​)) = ​f​(​x​) for any value ​x​.​[8]   This is a very useful property in many situations, as it means that an operation can be repeated or retried as                                          often as necessary without causing unintended effects. With non­idempotent operations, the algorithm may                          have to keep track of whether the operation was already performed or not.  In the ​HyperText Transfer Protocol (HTTP)​, idempotence and ​safety are the major attributes that separate                              HTTP verbs​. Of the major HTTP verbs, GET, PUT, and DELETE are idempotent (if implemented according to                                  the standard), but POST is not.​[9]     Remote Procedure Call (RPC)  Request­reply protocols provide relatively low­level support for requesting the execution of a remote operation,                            and also provide direct support for RPC and RMI.  RPC allows client programs to ​call procedures ​transparently in server programs running in separate processes                              and generally in different computers from the client.  RPC has the goal of making the programming of distributed systems look similar, if not identical, to                                  conventional programming – that is, achieving a high level of ​distribution transparency​. This unification is                              achieved in a very simple manner, by ​extending the abstraction of a procedure call to distributed environments​.                                  In particular, in RPC, procedures on remote machines can be called as if they are procedures in the local                                      address space​. The underlying RPC system then hides important aspects of distribution, including the                            encoding and decoding of parameters and results, the passing of messages and the preserving of the required                                  semantics for the procedure call.  Three issues that are important in understanding this concept:  ● programming with interfaces ​(the style of programming): in order to control the possible interactions                            between modules, an explicit interface is defined for each module; as long as its interface remains the                                  same, the implementation may be changed without affecting the users of the module. 
  30. 30. ○ service interface​: the specification of the procedures offered by a server, defining the types of                              the arguments of each of the procedures. But why an interface?  ■ It is not possible for a client module running in one process to access the variables in a                                    module in another process.  ■ The parameter­passing mechanisms used in local procedure calls are not suitable when                        the caller and procedure are in different processes. In particular, call by reference is not                              supported. Rather, the specification of a procedure in the interface of a module in a                              distributed program describes the parameters as input or output, or sometimes both.                        Input parameters are passed to the remote server by sending the values of the                            arguments in the request message and output parameters are returned in the reply                          message and are used as the result of the call.  ■ Addresses in one process are not valid in another remote one.  ○ Interface Definition Language (IDL)​: designed to allow procedures implemented in different                      languages to invoke one another. An IDL provides a notation for defining interfaces in which                              each of the parameters of an operation may be described as for input or output in addition to                                    having its type specified.  ● the ​call semantics ​associated with RPC:  ○ Retry request message​: Controls whether to retransmit the request message until either a                          reply is received or the server is assumed to have failed.  ○ Duplicate filtering​: Controls when retransmissions are used and whether to filter out duplicate                          requests at the server.  ○ Retransmission of results​: Controls whether to keep a history of result messages to enable                            lost results to be retransmitted without re­executing the operations at the server.  Combinations of these choices lead to a variety of possible semantics for the ​reliability ​of remote invocations                                  as seen by the invoker. Note that for local procedure calls, the semantics are exactly once, meaning that every                                      procedure is executed exactly once (except in the case of process failure). The choices of RPC invocation                                  semantics are defined as follows.  ○ Maybe semantics​: the remote procedure call ​may be executed once or not at all​. Maybe                              semantics arises when ​no fault­tolerance measures are applied and ​can suffer from the                          following types of failure​:  ■ omission failures​ if the request or result message is lost;  ■ crash failures​ when the server containing the remote operation fails.  Useful only for applications in which occasional failed calls are acceptable.  ○ At­least­once semantics​: ​the invoker receives either a result​, in which case the invoker knows                            that the procedure was executed at least once, ​or an exception informing it that no result was                                  received. It can be achieved by the ​retransmission of request messages​, which masks the                            omission failures of the request or result message. At­least­once semantics ​can suffer from the                            following types of failure​:  ■ crash failures​:when the server containing the remote procedure fails;  ■ arbitrary failures​: in cases when ​the request message is retransmitted​, the remote server                          may receive it and ​execute the procedure more than once​, possibly causing wrong                          values to be stored or returned. 
  31. 31. If the operations in a server can be designed so that all of the procedures in their service                                    interfaces ​are idempotent operations​, then at­least­once call semantics may be acceptable.  ○ At­most­once semantics​: ​the caller receives either a result​, in which case the caller knows that                              the procedure was executed exactly once, ​or an exception informing it that ​no result was                              received, in which case the ​procedure will have been ​executed either once or not at all​. It can                                    be achieved by using all of the fault­tolerance measures outlined in Figure 5.9.      ● Transparency​: RPC strives to offer at least ​location and access transparency​, hiding the physical                            location of the (potentially remote) procedure and also accessing local and remote procedures in the                              same way. Middleware can also offer additional levels of transparency to RPC. However, remote                            procedure calls suffer the followings:  ○ vulnerability to failure due to the ​network ​and/or of the ​remote server process and no ability to                                  distinguish among them.  ○ latency of a remote procedure call is several orders of magnitude greater than that of a local                                  one.  The current consensus is that remote calls should be made transparent in the sense that the syntax of                                    a remote call is the same as that of a local invocation, but that the difference between local and remote                                        calls should be expressed in their interfaces.    RPC Implementation​: The software components required to implement RPC are shown in Figure 5.10. The                              client that accesses a service ​includes one ​stub procedure for each procedure in the service interface​. The                                  stub procedure behaves like a local procedure to the client​, but instead of executing the call, it ​marshals the                                      procedure identifier and the arguments into a request message​, which it sends via its communication module                                to the server. When the reply message arrives, it ​unmarshals the results​. The ​server process contains a                                  dispatcher together ​with ​one server stub procedure and one service procedure for each procedure in the                                service interface​. ​The dispatcher selects one of the server stub procedures ​according to the procedure                              identifier in the request message. The server stub procedure then unmarshals the arguments in the request                                message, calls the corresponding service procedure and marshals the return values for the reply message.                              The service procedures implement the procedures in the service interface​. ​The client and server stub                             
  32. 32. procedures and the dispatcher can be generated automatically by an interface compiler from the interface                              definition of the service. RPC is generally implemented over a request­reply protocol like the ones discussed                                so far. The contents of request and reply messages are the same as those illustrated for request­reply                                  protocols in Figure 5.4. RPC may be implemented to have one of the choices of invocation semantics                                  discussed: at­least once or at­most­once is generally chosen. To achieve this, the communication module will                              implement the desired design choices in terms of retransmission of requests, dealing with duplicates and                              retransmission of results, as shown in Figure 5.9.        Remote Method Invocation (RMI)  RMI allows ​objects in different processes to communicate with one another​; it is an extension of local method                                    invocation that allows an object living in one process to invoke the methods of an object living in another                                      process.  The ​commonalities between RMI and RPC​ are as follows:  ● They both support ​programming with interfaces​.  ● both typically ​constructed on top of request­reply protocols and can offer a range of ​call semantics such                                  as ​at­least­once​ and ​at­most­once​.  ● both offer a ​similar level of transparency – that is, local and remote calls employ the same syntax but                                      remote interfaces typically expose the distributed nature of the underlying call, for example by                            supporting remote exceptions.    The following ​differences​ lead to added expressiveness:  ● The programmer is able to use the ​full expressive power of object­oriented programming in the                              development of distributed systems software.  ● Building on the concept of object identity in object­oriented systems, ​all objects in an RMI­based                              system have unique object references (whether they are local or remote), such object references can                              also be passed as parameters, thus offering ​significantly richer parameter­passing semantics than in                          RPC​. 

×