SlideShare a Scribd company logo
1 of 118
Download to read offline
3: Large-scale Distributed Systems

                                   Zubair Nabi

                        zubair.nabi@itu.edu.pk


                                 April 17, 2013




Zubair Nabi        3: Large-scale Distributed Systems   April 17, 2013   1 / 29
Outline



  1    Introduction


  2    Client-server Interaction


  3    Characteristics


  4    Message Passing Interface




  Zubair Nabi              3: Large-scale Distributed Systems   April 17, 2013   2 / 29
Outline



  1    Introduction


  2    Client-server Interaction


  3    Characteristics


  4    Message Passing Interface




  Zubair Nabi              3: Large-scale Distributed Systems   April 17, 2013   3 / 29
Distributed Systems




          Set of discrete machines which cooperate to perform computation




  Zubair Nabi              3: Large-scale Distributed Systems     April 17, 2013   4 / 29
Distributed Systems




          Set of discrete machines which cooperate to perform computation
          Give the notion of a single “machine”




  Zubair Nabi               3: Large-scale Distributed Systems    April 17, 2013   4 / 29
Distributed Systems




          Set of discrete machines which cooperate to perform computation
          Give the notion of a single “machine”
          Examples:
                Compute clusters




  Zubair Nabi               3: Large-scale Distributed Systems    April 17, 2013   4 / 29
Distributed Systems




          Set of discrete machines which cooperate to perform computation
          Give the notion of a single “machine”
          Examples:
                Compute clusters
                Distributed storage systems, such as Dropbox, Google Drive, etc.




  Zubair Nabi                3: Large-scale Distributed Systems          April 17, 2013   4 / 29
Distributed Systems




          Set of discrete machines which cooperate to perform computation
          Give the notion of a single “machine”
          Examples:
                Compute clusters
                Distributed storage systems, such as Dropbox, Google Drive, etc.
                The Web




  Zubair Nabi                3: Large-scale Distributed Systems          April 17, 2013   4 / 29
Advantages

          Scalability:
                The scale of the Internet (think how many queries Google servers
                handle daily)




  Zubair Nabi                3: Large-scale Distributed Systems          April 17, 2013   5 / 29
Advantages

          Scalability:
                The scale of the Internet (think how many queries Google servers
                handle daily)
                Only a matter of adding more machines




  Zubair Nabi                3: Large-scale Distributed Systems          April 17, 2013   5 / 29
Advantages

          Scalability:
                The scale of the Internet (think how many queries Google servers
                handle daily)
                Only a matter of adding more machines
                Cheaper than super computers




  Zubair Nabi                3: Large-scale Distributed Systems          April 17, 2013   5 / 29
Advantages

          Scalability:
                The scale of the Internet (think how many queries Google servers
                handle daily)
                Only a matter of adding more machines
                Cheaper than super computers
                More machines means more parallelism, hence better performance




  Zubair Nabi                3: Large-scale Distributed Systems        April 17, 2013   5 / 29
Advantages

          Scalability:
                The scale of the Internet (think how many queries Google servers
                handle daily)
                Only a matter of adding more machines
                Cheaper than super computers
                More machines means more parallelism, hence better performance
          Sharing:
                The same resource is shared between multiple users




  Zubair Nabi                3: Large-scale Distributed Systems        April 17, 2013   5 / 29
Advantages

          Scalability:
                The scale of the Internet (think how many queries Google servers
                handle daily)
                Only a matter of adding more machines
                Cheaper than super computers
                More machines means more parallelism, hence better performance
          Sharing:
                The same resource is shared between multiple users
                Just like the Internet is shared between millions of users




  Zubair Nabi                 3: Large-scale Distributed Systems             April 17, 2013   5 / 29
Advantages

          Scalability:
                The scale of the Internet (think how many queries Google servers
                handle daily)
                Only a matter of adding more machines
                Cheaper than super computers
                More machines means more parallelism, hence better performance
          Sharing:
                The same resource is shared between multiple users
                Just like the Internet is shared between millions of users
          Communication:
                Communication between (potentially geographically isolated) machines
                and users (via email, Facebook, etc.)




  Zubair Nabi                 3: Large-scale Distributed Systems             April 17, 2013   5 / 29
Advantages

          Scalability:
                The scale of the Internet (think how many queries Google servers
                handle daily)
                Only a matter of adding more machines
                Cheaper than super computers
                More machines means more parallelism, hence better performance
          Sharing:
                The same resource is shared between multiple users
                Just like the Internet is shared between millions of users
          Communication:
                Communication between (potentially geographically isolated) machines
                and users (via email, Facebook, etc.)
          Reliability:
                The service can remain active even if multiple machines go down



  Zubair Nabi                 3: Large-scale Distributed Systems             April 17, 2013   5 / 29
Challenges


          Concurrency:
                Concurrent execution requires some form of coordination




  Zubair Nabi                3: Large-scale Distributed Systems           April 17, 2013   6 / 29
Challenges


          Concurrency:
                Concurrent execution requires some form of coordination
          Fault-tolerance:
                Any component can fail at any instant due to a software or a hardware
                bug




  Zubair Nabi                3: Large-scale Distributed Systems           April 17, 2013   6 / 29
Challenges


          Concurrency:
                Concurrent execution requires some form of coordination
          Fault-tolerance:
                Any component can fail at any instant due to a software or a hardware
                bug
          Security:
                One machine can compromise the entire system




  Zubair Nabi                3: Large-scale Distributed Systems           April 17, 2013   6 / 29
Challenges


          Concurrency:
                Concurrent execution requires some form of coordination
          Fault-tolerance:
                Any component can fail at any instant due to a software or a hardware
                bug
          Security:
                One machine can compromise the entire system
          Coordination:
                No global time so non-trivial to coordinate




  Zubair Nabi                 3: Large-scale Distributed Systems          April 17, 2013   6 / 29
Challenges


          Concurrency:
                Concurrent execution requires some form of coordination
          Fault-tolerance:
                Any component can fail at any instant due to a software or a hardware
                bug
          Security:
                One machine can compromise the entire system
          Coordination:
                No global time so non-trivial to coordinate
          Trouble shooting:
                Hard to trouble shoot because hard to reason about the system




  Zubair Nabi                 3: Large-scale Distributed Systems          April 17, 2013   6 / 29
Transparency


          Distributed systems give the notion of a single machine or keep the
          distribution transparent




  Zubair Nabi              3: Large-scale Distributed Systems       April 17, 2013   7 / 29
Transparency


          Distributed systems give the notion of a single machine or keep the
          distribution transparent
          The degree of this transparency can be mapped onto an entire
          spectrum of options for both users and programmers




  Zubair Nabi              3: Large-scale Distributed Systems       April 17, 2013   7 / 29
Transparency


          Distributed systems give the notion of a single machine or keep the
          distribution transparent
          The degree of this transparency can be mapped onto an entire
          spectrum of options for both users and programmers
          For instance:
                A web user is aware of network communication but the number of
                accessed machines is transparent




  Zubair Nabi                3: Large-scale Distributed Systems        April 17, 2013   7 / 29
Transparency


          Distributed systems give the notion of a single machine or keep the
          distribution transparent
          The degree of this transparency can be mapped onto an entire
          spectrum of options for both users and programmers
          For instance:
                A web user is aware of network communication but the number of
                accessed machines is transparent
          Transparency can be ensured by middleware that adds a layer of
          abstraction




  Zubair Nabi                3: Large-scale Distributed Systems        April 17, 2013   7 / 29
Transparency


          Distributed systems give the notion of a single machine or keep the
          distribution transparent
          The degree of this transparency can be mapped onto an entire
          spectrum of options for both users and programmers
          For instance:
                A web user is aware of network communication but the number of
                accessed machines is transparent
          Transparency can be ensured by middleware that adds a layer of
          abstraction
          Can span access, concurrency, failure, location, migration,
          persistence, relocation, replication




  Zubair Nabi                3: Large-scale Distributed Systems        April 17, 2013   7 / 29
Outline



  1    Introduction


  2    Client-server Interaction


  3    Characteristics


  4    Message Passing Interface




  Zubair Nabi              3: Large-scale Distributed Systems   April 17, 2013   8 / 29
Request-reply protocol



          Standard operation
                1   Client sends request to the server




  Zubair Nabi                     3: Large-scale Distributed Systems   April 17, 2013   9 / 29
Request-reply protocol



          Standard operation
                1   Client sends request to the server
                2   Server processes the request and sends a corresponding response




  Zubair Nabi                    3: Large-scale Distributed Systems       April 17, 2013   9 / 29
Request-reply protocol



          Standard operation
                1   Client sends request to the server
                2   Server processes the request and sends a corresponding response
          In the synchronous model, the client blocks till the response is received




  Zubair Nabi                    3: Large-scale Distributed Systems       April 17, 2013   9 / 29
Request-reply protocol



          Standard operation
                1   Client sends request to the server
                2   Server processes the request and sends a corresponding response
          In the synchronous model, the client blocks till the response is received
          In case of the asynchronous model, the client continues its execution




  Zubair Nabi                    3: Large-scale Distributed Systems       April 17, 2013   9 / 29
Request-reply protocol



          Standard operation
                1   Client sends request to the server
                2   Server processes the request and sends a corresponding response
          In the synchronous model, the client blocks till the response is received
          In case of the asynchronous model, the client continues its execution
          For instance: HTTP 1.0




  Zubair Nabi                    3: Large-scale Distributed Systems       April 17, 2013   9 / 29
Request-reply protocol



          Standard operation
                1   Client sends request to the server
                2   Server processes the request and sends a corresponding response
          In the synchronous model, the client blocks till the response is received
          In case of the asynchronous model, the client continues its execution
          For instance: HTTP 1.0
            1 Client sends GET /index.html




  Zubair Nabi                    3: Large-scale Distributed Systems       April 17, 2013   9 / 29
Request-reply protocol



          Standard operation
                1   Client sends request to the server
                2   Server processes the request and sends a corresponding response
          In the synchronous model, the client blocks till the response is received
          In case of the asynchronous model, the client continues its execution
          For instance: HTTP 1.0
            1 Client sends GET /index.html
            2 Server responds with index.html




  Zubair Nabi                    3: Large-scale Distributed Systems       April 17, 2013   9 / 29
Request-reply protocol



          Standard operation
                1   Client sends request to the server
                2   Server processes the request and sends a corresponding response
          In the synchronous model, the client blocks till the response is received
          In case of the asynchronous model, the client continues its execution
          For instance: HTTP 1.0
            1 Client sends GET /index.html
            2 Server responds with index.html
            3 Client renders index.html




  Zubair Nabi                    3: Large-scale Distributed Systems       April 17, 2013   9 / 29
Errors and failures




          Errors are handled at the application-level




  Zubair Nabi               3: Large-scale Distributed Systems   April 17, 2013   10 / 29
Errors and failures




          Errors are handled at the application-level
                For instance, if the client requests a non-existent web page just return a
                special reply: 404 Not Found




  Zubair Nabi                 3: Large-scale Distributed Systems           April 17, 2013   10 / 29
Errors and failures




          Errors are handled at the application-level
                For instance, if the client requests a non-existent web page just return a
                special reply: 404 Not Found
          Failures are system-level things




  Zubair Nabi                 3: Large-scale Distributed Systems           April 17, 2013   10 / 29
Errors and failures




          Errors are handled at the application-level
                For instance, if the client requests a non-existent web page just return a
                special reply: 404 Not Found
          Failures are system-level things
                For instance, lost message, client/server crash, etc.




  Zubair Nabi                 3: Large-scale Distributed Systems           April 17, 2013   10 / 29
Errors and failures




          Errors are handled at the application-level
                For instance, if the client requests a non-existent web page just return a
                special reply: 404 Not Found
          Failures are system-level things
                For instance, lost message, client/server crash, etc.
          To handle failure, the client must timeout after T




  Zubair Nabi                 3: Large-scale Distributed Systems           April 17, 2013   10 / 29
Errors and failures




          Errors are handled at the application-level
                For instance, if the client requests a non-existent web page just return a
                special reply: 404 Not Found
          Failures are system-level things
                For instance, lost message, client/server crash, etc.
          To handle failure, the client must timeout after T
                The client can retry on a timeout




  Zubair Nabi                 3: Large-scale Distributed Systems           April 17, 2013   10 / 29
Errors and failures




          Errors are handled at the application-level
                For instance, if the client requests a non-existent web page just return a
                special reply: 404 Not Found
          Failures are system-level things
                For instance, lost message, client/server crash, etc.
          To handle failure, the client must timeout after T
                The client can retry on a timeout
                Setting value of T is system-specific




  Zubair Nabi                 3: Large-scale Distributed Systems           April 17, 2013   10 / 29
Remote Procedure Call


          Request/response protocols are widely used but too low level




  Zubair Nabi              3: Large-scale Distributed Systems     April 17, 2013   11 / 29
Remote Procedure Call


          Request/response protocols are widely used but too low level
                Need to define each request separately including their network message
                representation




  Zubair Nabi                3: Large-scale Distributed Systems        April 17, 2013   11 / 29
Remote Procedure Call


          Request/response protocols are widely used but too low level
                Need to define each request separately including their network message
                representation
          Remote procedure call (RPC) presents a simpler abstraction




  Zubair Nabi                3: Large-scale Distributed Systems        April 17, 2013   11 / 29
Remote Procedure Call


          Request/response protocols are widely used but too low level
                Need to define each request separately including their network message
                representation
          Remote procedure call (RPC) presents a simpler abstraction
                Programmer invokes a procedure which executes on a remote machine
                (the server)




  Zubair Nabi                3: Large-scale Distributed Systems        April 17, 2013   11 / 29
Remote Procedure Call


          Request/response protocols are widely used but too low level
                Need to define each request separately including their network message
                representation
          Remote procedure call (RPC) presents a simpler abstraction
                Programmer invokes a procedure which executes on a remote machine
                (the server)
                RPC subsystem takes care of message formats, communication,
                timeouts, etc.




  Zubair Nabi                3: Large-scale Distributed Systems        April 17, 2013   11 / 29
Remote Procedure Call


          Request/response protocols are widely used but too low level
                Need to define each request separately including their network message
                representation
          Remote procedure call (RPC) presents a simpler abstraction
                Programmer invokes a procedure which executes on a remote machine
                (the server)
                RPC subsystem takes care of message formats, communication,
                timeouts, etc.
          Distribution of the system becomes transparent




  Zubair Nabi                3: Large-scale Distributed Systems        April 17, 2013   11 / 29
Remote Procedure Call


          Request/response protocols are widely used but too low level
                Need to define each request separately including their network message
                representation
          Remote procedure call (RPC) presents a simpler abstraction
                Programmer invokes a procedure which executes on a remote machine
                (the server)
                RPC subsystem takes care of message formats, communication,
                timeouts, etc.
          Distribution of the system becomes transparent
          Integrated with the programming language




  Zubair Nabi                3: Large-scale Distributed Systems        April 17, 2013   11 / 29
Remote Procedure Call


          Request/response protocols are widely used but too low level
                Need to define each request separately including their network message
                representation
          Remote procedure call (RPC) presents a simpler abstraction
                Programmer invokes a procedure which executes on a remote machine
                (the server)
                RPC subsystem takes care of message formats, communication,
                timeouts, etc.
          Distribution of the system becomes transparent
          Integrated with the programming language
          RPC layer adds stubs at client end which when invoked execute a
          method at the server




  Zubair Nabi                3: Large-scale Distributed Systems        April 17, 2013   11 / 29
Example: XML-RPC


         XML is used to encode method invocations (method names,
         parameters, etc.)




 Zubair Nabi             3: Large-scale Distributed Systems   April 17, 2013   12 / 29
Example: XML-RPC


         XML is used to encode method invocations (method names,
         parameters, etc.)
         HTTP POST used to send request and receive response (also
         encoded in XML)




 Zubair Nabi             3: Large-scale Distributed Systems   April 17, 2013   12 / 29
Example: XML-RPC


         XML is used to encode method invocations (method names,
         parameters, etc.)
         HTTP POST used to send request and receive response (also
         encoded in XML)
         Looks like a regular web session on wire so plays well with
         middleboxes




 Zubair Nabi               3: Large-scale Distributed Systems      April 17, 2013   12 / 29
Example: XML-RPC


         XML is used to encode method invocations (method names,
         parameters, etc.)
         HTTP POST used to send request and receive response (also
         encoded in XML)
         Looks like a regular web session on wire so plays well with
         middleboxes
         Language agnostic and extensible




 Zubair Nabi               3: Large-scale Distributed Systems      April 17, 2013   12 / 29
Example: XML-RPC


         XML is used to encode method invocations (method names,
         parameters, etc.)
         HTTP POST used to send request and receive response (also
         encoded in XML)
         Looks like a regular web session on wire so plays well with
         middleboxes
         Language agnostic and extensible
         Extended with more features (namespaces, user-defined types, etc.)
         and diverse transports (TCP, UDP, etc.) to result in Simple Object
         Access Protocol (SOAP)




 Zubair Nabi               3: Large-scale Distributed Systems      April 17, 2013   12 / 29
RPC shortcomings




          RPC mechanisms are synchronous




  Zubair Nabi            3: Large-scale Distributed Systems   April 17, 2013   13 / 29
RPC shortcomings




          RPC mechanisms are synchronous
                Client blocks till response is received




  Zubair Nabi                 3: Large-scale Distributed Systems   April 17, 2013   13 / 29
RPC shortcomings




          RPC mechanisms are synchronous
                Client blocks till response is received
                Poor responsiveness, especially in high latency networks




  Zubair Nabi                3: Large-scale Distributed Systems            April 17, 2013   13 / 29
RPC shortcomings




          RPC mechanisms are synchronous
                Client blocks till response is received
                Poor responsiveness, especially in high latency networks
          2006 ushered in the age of Asynchronous JavaScript with XML (AJAX)




  Zubair Nabi                3: Large-scale Distributed Systems            April 17, 2013   13 / 29
RPC shortcomings




          RPC mechanisms are synchronous
                Client blocks till response is received
                Poor responsiveness, especially in high latency networks
          2006 ushered in the age of Asynchronous JavaScript with XML (AJAX)
                Update web page without reloading




  Zubair Nabi                3: Large-scale Distributed Systems            April 17, 2013   13 / 29
RPC shortcomings




          RPC mechanisms are synchronous
                Client blocks till response is received
                Poor responsiveness, especially in high latency networks
          2006 ushered in the age of Asynchronous JavaScript with XML (AJAX)
                Update web page without reloading
                For instance, Google Maps, Gmail, etc.




  Zubair Nabi                3: Large-scale Distributed Systems            April 17, 2013   13 / 29
Representational State Transfer




          AJAX still revolves around RPC (just asynchronously)




  Zubair Nabi              3: Large-scale Distributed Systems    April 17, 2013   14 / 29
Representational State Transfer




          AJAX still revolves around RPC (just asynchronously)
          Representational State Transfer (REST) offers an alternative
                All resources have a name: URL or URI




  Zubair Nabi                3: Large-scale Distributed Systems    April 17, 2013   14 / 29
Representational State Transfer




          AJAX still revolves around RPC (just asynchronously)
          Representational State Transfer (REST) offers an alternative
                All resources have a name: URL or URI
                Resources are manipulated with PUT, GET, POST, and DELETE
                methods




  Zubair Nabi               3: Large-scale Distributed Systems     April 17, 2013   14 / 29
Representational State Transfer




          AJAX still revolves around RPC (just asynchronously)
          Representational State Transfer (REST) offers an alternative
                All resources have a name: URL or URI
                Resources are manipulated with PUT, GET, POST, and DELETE
                methods
                State is sent along with operations




  Zubair Nabi               3: Large-scale Distributed Systems     April 17, 2013   14 / 29
Representational State Transfer




          AJAX still revolves around RPC (just asynchronously)
          Representational State Transfer (REST) offers an alternative
                All resources have a name: URL or URI
                Resources are manipulated with PUT, GET, POST, and DELETE
                methods
                State is sent along with operations
          Widely used these days (For instance, by Amazon, Twitter, etc.)




  Zubair Nabi               3: Large-scale Distributed Systems     April 17, 2013   14 / 29
Outline



  1    Introduction


  2    Client-server Interaction


  3    Characteristics


  4    Message Passing Interface




  Zubair Nabi              3: Large-scale Distributed Systems   April 17, 2013   15 / 29
Clocks




          Distributed systems need to be able to:




  Zubair Nabi              3: Large-scale Distributed Systems   April 17, 2013   16 / 29
Clocks




          Distributed systems need to be able to:
                Order events produced by concurrent processes




  Zubair Nabi                3: Large-scale Distributed Systems   April 17, 2013   16 / 29
Clocks




          Distributed systems need to be able to:
                Order events produced by concurrent processes
                Synchronize senders and receivers of messages




  Zubair Nabi                3: Large-scale Distributed Systems   April 17, 2013   16 / 29
Clocks




          Distributed systems need to be able to:
                Order events produced by concurrent processes
                Synchronize senders and receivers of messages
                Serialize concurrent accesses to shared objects




  Zubair Nabi                3: Large-scale Distributed Systems   April 17, 2013   16 / 29
Clocks




          Distributed systems need to be able to:
                Order events produced by concurrent processes
                Synchronize senders and receivers of messages
                Serialize concurrent accesses to shared objects
                Coordinate joint activity




  Zubair Nabi                3: Large-scale Distributed Systems   April 17, 2013   16 / 29
Clocks




          Distributed systems need to be able to:
                Order events produced by concurrent processes
                Synchronize senders and receivers of messages
                Serialize concurrent accesses to shared objects
                Coordinate joint activity
          Clocks are employed for this




  Zubair Nabi                3: Large-scale Distributed Systems   April 17, 2013   16 / 29
Clocks




          Distributed systems need to be able to:
                Order events produced by concurrent processes
                Synchronize senders and receivers of messages
                Serialize concurrent accesses to shared objects
                Coordinate joint activity
          Clocks are employed for this
          But quartz oscillators oscillate at slightly different frequencies leading
          to clock drift and resulting in clock skew between clocks




  Zubair Nabi                3: Large-scale Distributed Systems         April 17, 2013   16 / 29
Clock synchronization




          Clock synchronization algorithms try to minimize skew between a set of
          clocks




  Zubair Nabi              3: Large-scale Distributed Systems      April 17, 2013   17 / 29
Clock synchronization




          Clock synchronization algorithms try to minimize skew between a set of
          clocks
                Decide upon a correct time




  Zubair Nabi                3: Large-scale Distributed Systems    April 17, 2013   17 / 29
Clock synchronization




          Clock synchronization algorithms try to minimize skew between a set of
          clocks
                Decide upon a correct time
                Communicate to agree (compensating for delays)




  Zubair Nabi                3: Large-scale Distributed Systems    April 17, 2013   17 / 29
Clock synchronization




          Clock synchronization algorithms try to minimize skew between a set of
          clocks
                Decide upon a correct time
                Communicate to agree (compensating for delays)
                Possibly multiple servers involved




  Zubair Nabi                3: Large-scale Distributed Systems    April 17, 2013   17 / 29
Clock synchronization




          Clock synchronization algorithms try to minimize skew between a set of
          clocks
                Decide upon a correct time
                Communicate to agree (compensating for delays)
                Possibly multiple servers involved
          In reality, still a 1-10ms skew after sync (but we can live with that)




  Zubair Nabi                3: Large-scale Distributed Systems         April 17, 2013   17 / 29
Ordering




          Time is used to ensure ordering




  Zubair Nabi              3: Large-scale Distributed Systems   April 17, 2013   18 / 29
Ordering




          Time is used to ensure ordering
                Withdraw money at 23:59.45




  Zubair Nabi               3: Large-scale Distributed Systems   April 17, 2013   18 / 29
Ordering




          Time is used to ensure ordering
                Withdraw money at 23:59.45
                Bank calculates interest at 00:00.0




  Zubair Nabi                 3: Large-scale Distributed Systems   April 17, 2013   18 / 29
Ordering




          Time is used to ensure ordering
                Withdraw money at 23:59.45
                Bank calculates interest at 00:00.0
                The withdraw money should not be included in the interest calculation




  Zubair Nabi                 3: Large-scale Distributed Systems         April 17, 2013   18 / 29
Ordering




          Time is used to ensure ordering
                Withdraw money at 23:59.45
                Bank calculates interest at 00:00.0
                The withdraw money should not be included in the interest calculation
          In most cases, only need to know that a happened before b, known as
          the happens-before relation




  Zubair Nabi                 3: Large-scale Distributed Systems         April 17, 2013   18 / 29
Ordering




          Time is used to ensure ordering
                Withdraw money at 23:59.45
                Bank calculates interest at 00:00.0
                The withdraw money should not be included in the interest calculation
          In most cases, only need to know that a happened before b, known as
          the happens-before relation
          Multiple algorithms exists to ensure the happens-before relation




  Zubair Nabi                 3: Large-scale Distributed Systems         April 17, 2013   18 / 29
Distributed Mutual Exclusion




          Concurrent access to shared resources needs to be synchronized




  Zubair Nabi             3: Large-scale Distributed Systems    April 17, 2013   19 / 29
Distributed Mutual Exclusion




          Concurrent access to shared resources needs to be synchronized
          Need hardware support on local machine




  Zubair Nabi             3: Large-scale Distributed Systems    April 17, 2013   19 / 29
Distributed Mutual Exclusion




          Concurrent access to shared resources needs to be synchronized
          Need hardware support on local machine
                Locks, semaphores, etc.




  Zubair Nabi                3: Large-scale Distributed Systems   April 17, 2013   19 / 29
Distributed Mutual Exclusion




          Concurrent access to shared resources needs to be synchronized
          Need hardware support on local machine
                Locks, semaphores, etc.
          But this support is not available across a distributed system




  Zubair Nabi                3: Large-scale Distributed Systems      April 17, 2013   19 / 29
Distributed Mutual Exclusion (2)




  Multiple methods exist to ensure this:
          Central lock server: All lock requests are handled by a central server




  Zubair Nabi              3: Large-scale Distributed Systems       April 17, 2013   20 / 29
Distributed Mutual Exclusion (2)




  Multiple methods exist to ensure this:
          Central lock server: All lock requests are handled by a central server
          Token passing: Arrange nodes into a ring and a token is passed
          around




  Zubair Nabi              3: Large-scale Distributed Systems       April 17, 2013   20 / 29
Distributed Mutual Exclusion (2)




  Multiple methods exist to ensure this:
          Central lock server: All lock requests are handled by a central server
          Token passing: Arrange nodes into a ring and a token is passed
          around
          Totally-ordered multicast: Clients multicast requests to each other




  Zubair Nabi              3: Large-scale Distributed Systems       April 17, 2013   20 / 29
Consensus




         Getting processes in a distributed system to agree on something




 Zubair Nabi              3: Large-scale Distributed Systems     April 17, 2013   21 / 29
Consensus




         Getting processes in a distributed system to agree on something
         Requirements for correct solution
               Agreement: All nodes arrive at the same answer




 Zubair Nabi                3: Large-scale Distributed Systems   April 17, 2013   21 / 29
Consensus




         Getting processes in a distributed system to agree on something
         Requirements for correct solution
               Agreement: All nodes arrive at the same answer
               Validity: Answer is one that was proposed by someone




 Zubair Nabi                3: Large-scale Distributed Systems        April 17, 2013   21 / 29
Consensus




         Getting processes in a distributed system to agree on something
         Requirements for correct solution
               Agreement: All nodes arrive at the same answer
               Validity: Answer is one that was proposed by someone
               Termination: All nodes eventually decide




 Zubair Nabi                3: Large-scale Distributed Systems        April 17, 2013   21 / 29
Distributed transactions



          Composite operations (i.e. A collection of reads and updates to a set of
          objects)




  Zubair Nabi               3: Large-scale Distributed Systems      April 17, 2013   22 / 29
Distributed transactions



          Composite operations (i.e. A collection of reads and updates to a set of
          objects)
          A transaction is atomic




  Zubair Nabi               3: Large-scale Distributed Systems      April 17, 2013   22 / 29
Distributed transactions



          Composite operations (i.e. A collection of reads and updates to a set of
          objects)
          A transaction is atomic
                If it commits, all operations are applied




  Zubair Nabi                  3: Large-scale Distributed Systems   April 17, 2013   22 / 29
Distributed transactions



          Composite operations (i.e. A collection of reads and updates to a set of
          objects)
          A transaction is atomic
                If it commits, all operations are applied
                If it aborts, no state mutation at all




  Zubair Nabi                  3: Large-scale Distributed Systems   April 17, 2013   22 / 29
Distributed transactions



          Composite operations (i.e. A collection of reads and updates to a set of
          objects)
          A transaction is atomic
                If it commits, all operations are applied
                If it aborts, no state mutation at all
          Distributed transactions span multiple transaction processing servers




  Zubair Nabi                  3: Large-scale Distributed Systems   April 17, 2013   22 / 29
Distributed transactions



          Composite operations (i.e. A collection of reads and updates to a set of
          objects)
          A transaction is atomic
                If it commits, all operations are applied
                If it aborts, no state mutation at all
          Distributed transactions span multiple transaction processing servers
                For instance, booking flights: Lahore -> Dubai -> New York




  Zubair Nabi                  3: Large-scale Distributed Systems       April 17, 2013   22 / 29
Distributed transactions



          Composite operations (i.e. A collection of reads and updates to a set of
          objects)
          A transaction is atomic
                If it commits, all operations are applied
                If it aborts, no state mutation at all
          Distributed transactions span multiple transaction processing servers
                For instance, booking flights: Lahore -> Dubai -> New York
                Need to book entire trip




  Zubair Nabi                  3: Large-scale Distributed Systems       April 17, 2013   22 / 29
Distributed transactions



          Composite operations (i.e. A collection of reads and updates to a set of
          objects)
          A transaction is atomic
                If it commits, all operations are applied
                If it aborts, no state mutation at all
          Distributed transactions span multiple transaction processing servers
                For instance, booking flights: Lahore -> Dubai -> New York
                Need to book entire trip
          Actions need to be coordinated across multiple parties




  Zubair Nabi                  3: Large-scale Distributed Systems       April 17, 2013   22 / 29
Replication


          A number of distributed systems involve replication




  Zubair Nabi               3: Large-scale Distributed Systems   April 17, 2013   23 / 29
Replication


          A number of distributed systems involve replication
                Data replication: Multiple copies of some object stored at different
                servers




  Zubair Nabi                 3: Large-scale Distributed Systems            April 17, 2013   23 / 29
Replication


          A number of distributed systems involve replication
                Data replication: Multiple copies of some object stored at different
                servers
                Computation replication: Multiple servers capable of providing an
                operation




  Zubair Nabi                 3: Large-scale Distributed Systems            April 17, 2013   23 / 29
Replication


          A number of distributed systems involve replication
                    Data replication: Multiple copies of some object stored at different
                    servers
                    Computation replication: Multiple servers capable of providing an
                    operation
          Advantages:
                1   Load balancing: Work spread out across clients




  Zubair Nabi                     3: Large-scale Distributed Systems            April 17, 2013   23 / 29
Replication


          A number of distributed systems involve replication
                    Data replication: Multiple copies of some object stored at different
                    servers
                    Computation replication: Multiple servers capable of providing an
                    operation
          Advantages:
                1   Load balancing: Work spread out across clients
                2   Lower latency: Better performance if replica close to the client




  Zubair Nabi                     3: Large-scale Distributed Systems            April 17, 2013   23 / 29
Replication


          A number of distributed systems involve replication
                    Data replication: Multiple copies of some object stored at different
                    servers
                    Computation replication: Multiple servers capable of providing an
                    operation
          Advantages:
                1 Load balancing: Work spread out across clients
                2 Lower latency: Better performance if replica close to the client
                3 Fault tolerance: Failure of some replicas can be tolerated




  Zubair Nabi                     3: Large-scale Distributed Systems            April 17, 2013   23 / 29
Replication


          A number of distributed systems involve replication
                    Data replication: Multiple copies of some object stored at different
                    servers
                    Computation replication: Multiple servers capable of providing an
                    operation
          Advantages:
                1 Load balancing: Work spread out across clients
                2 Lower latency: Better performance if replica close to the client
                3 Fault tolerance: Failure of some replicas can be tolerated

          Examples: DNS, content distribution networks, database replication,
          etc.




  Zubair Nabi                     3: Large-scale Distributed Systems            April 17, 2013   23 / 29
CAP


         CAP:
               1   Consistency: All nodes see the same state




 Zubair Nabi                    3: Large-scale Distributed Systems   April 17, 2013   24 / 29
CAP


         CAP:
               1   Consistency: All nodes see the same state
               2   Availability: All requests get a response




 Zubair Nabi                    3: Large-scale Distributed Systems   April 17, 2013   24 / 29
CAP


         CAP:
               1 Consistency: All nodes see the same state
               2 Availability: All requests get a response
               3 Partitioning: System continues to operate even in the face of node
                 failure




 Zubair Nabi                   3: Large-scale Distributed Systems        April 17, 2013   24 / 29
CAP


         CAP:
               1 Consistency: All nodes see the same state
               2 Availability: All requests get a response
               3 Partitioning: System continues to operate even in the face of node
                 failure
         Brewer’s conjecture states that in a distributed system only 2 out of 3
         possible




 Zubair Nabi                   3: Large-scale Distributed Systems        April 17, 2013   24 / 29
CAP


         CAP:
               1 Consistency: All nodes see the same state
               2 Availability: All requests get a response
               3 Partitioning: System continues to operate even in the face of node
                 failure
         Brewer’s conjecture states that in a distributed system only 2 out of 3
         possible
         In the current setup, partitioning is a given: Hardware/software fails all
         the time




 Zubair Nabi                   3: Large-scale Distributed Systems        April 17, 2013   24 / 29
CAP


         CAP:
               1 Consistency: All nodes see the same state
               2 Availability: All requests get a response
               3 Partitioning: System continues to operate even in the face of node
                 failure
         Brewer’s conjecture states that in a distributed system only 2 out of 3
         possible
         In the current setup, partitioning is a given: Hardware/software fails all
         the time
         Therefore, systems need to choose between consistency and
         availability




 Zubair Nabi                   3: Large-scale Distributed Systems        April 17, 2013   24 / 29
References




          George Coulouris, Jean Dollimore, Tim Kindberg, and Gordon Blair.
          2011. Distributed Systems: Concepts and Design (5th ed.).
          Addison-Wesley Publishing Company, USA.




  Zubair Nabi              3: Large-scale Distributed Systems    April 17, 2013   25 / 29

More Related Content

Similar to Topic 3: Large-scale Distributed Systems

distributed system chapter one introduction to distribued system.pdf
distributed system chapter one introduction to distribued system.pdfdistributed system chapter one introduction to distribued system.pdf
distributed system chapter one introduction to distribued system.pdflematadese670
 
week_1Lec01_CS422 (1).pptx
week_1Lec01_CS422 (1).pptxweek_1Lec01_CS422 (1).pptx
week_1Lec01_CS422 (1).pptxmivomi1
 
chapter 1- introduction to distributed system.ppt
chapter 1- introduction to distributed system.pptchapter 1- introduction to distributed system.ppt
chapter 1- introduction to distributed system.pptAschalewAyele2
 
Seminar Report on Google File System
Seminar Report on Google File SystemSeminar Report on Google File System
Seminar Report on Google File SystemVishal Polley
 
Distributed Operating System
Distributed Operating SystemDistributed Operating System
Distributed Operating SystemAjithaG9
 
CSI-503 - 11.Distributed Operating System
CSI-503 - 11.Distributed Operating SystemCSI-503 - 11.Distributed Operating System
CSI-503 - 11.Distributed Operating Systemghayour abbas
 
Distributed computing ).ppt him
Distributed computing ).ppt himDistributed computing ).ppt him
Distributed computing ).ppt himHimanshu Saini
 
Chapter 1 -_characterization_of_distributed_systems
Chapter 1 -_characterization_of_distributed_systemsChapter 1 -_characterization_of_distributed_systems
Chapter 1 -_characterization_of_distributed_systemsFrancelyno Murela
 
Chapter 1-Introduction.ppt
Chapter 1-Introduction.pptChapter 1-Introduction.ppt
Chapter 1-Introduction.pptbalewayalew
 

Similar to Topic 3: Large-scale Distributed Systems (20)

distributed system chapter one introduction to distribued system.pdf
distributed system chapter one introduction to distribued system.pdfdistributed system chapter one introduction to distribued system.pdf
distributed system chapter one introduction to distribued system.pdf
 
Distributed Systems.pptx
Distributed Systems.pptxDistributed Systems.pptx
Distributed Systems.pptx
 
Aos distibutted system
Aos distibutted systemAos distibutted system
Aos distibutted system
 
week_1Lec01_CS422 (1).pptx
week_1Lec01_CS422 (1).pptxweek_1Lec01_CS422 (1).pptx
week_1Lec01_CS422 (1).pptx
 
different-os.pptx
different-os.pptxdifferent-os.pptx
different-os.pptx
 
chapter 1- introduction to distributed system.ppt
chapter 1- introduction to distributed system.pptchapter 1- introduction to distributed system.ppt
chapter 1- introduction to distributed system.ppt
 
Revant Rastogi
Revant Rastogi Revant Rastogi
Revant Rastogi
 
3. challenges
3. challenges3. challenges
3. challenges
 
Seminar Report on Google File System
Seminar Report on Google File SystemSeminar Report on Google File System
Seminar Report on Google File System
 
Cloud pres (1)
Cloud pres (1)Cloud pres (1)
Cloud pres (1)
 
Chapter 1
Chapter 1Chapter 1
Chapter 1
 
Distributed Operating System
Distributed Operating SystemDistributed Operating System
Distributed Operating System
 
Cloud computing
Cloud computingCloud computing
Cloud computing
 
CSI-503 - 11.Distributed Operating System
CSI-503 - 11.Distributed Operating SystemCSI-503 - 11.Distributed Operating System
CSI-503 - 11.Distributed Operating System
 
Distributed computing ).ppt him
Distributed computing ).ppt himDistributed computing ).ppt him
Distributed computing ).ppt him
 
50120130406042
5012013040604250120130406042
50120130406042
 
OS-UNIT-1-Part-1.pptx
OS-UNIT-1-Part-1.pptxOS-UNIT-1-Part-1.pptx
OS-UNIT-1-Part-1.pptx
 
DS Unit I to III MKU Questions.pdf
DS Unit I to III MKU Questions.pdfDS Unit I to III MKU Questions.pdf
DS Unit I to III MKU Questions.pdf
 
Chapter 1 -_characterization_of_distributed_systems
Chapter 1 -_characterization_of_distributed_systemsChapter 1 -_characterization_of_distributed_systems
Chapter 1 -_characterization_of_distributed_systems
 
Chapter 1-Introduction.ppt
Chapter 1-Introduction.pptChapter 1-Introduction.ppt
Chapter 1-Introduction.ppt
 

More from Zubair Nabi

AOS Lab 12: Network Communication
AOS Lab 12: Network CommunicationAOS Lab 12: Network Communication
AOS Lab 12: Network CommunicationZubair Nabi
 
AOS Lab 11: Virtualization
AOS Lab 11: VirtualizationAOS Lab 11: Virtualization
AOS Lab 11: VirtualizationZubair Nabi
 
AOS Lab 10: File system -- Inodes and beyond
AOS Lab 10: File system -- Inodes and beyondAOS Lab 10: File system -- Inodes and beyond
AOS Lab 10: File system -- Inodes and beyondZubair Nabi
 
AOS Lab 8: Interrupts and Device Drivers
AOS Lab 8: Interrupts and Device DriversAOS Lab 8: Interrupts and Device Drivers
AOS Lab 8: Interrupts and Device DriversZubair Nabi
 
AOS Lab 7: Page tables
AOS Lab 7: Page tablesAOS Lab 7: Page tables
AOS Lab 7: Page tablesZubair Nabi
 
AOS Lab 6: Scheduling
AOS Lab 6: SchedulingAOS Lab 6: Scheduling
AOS Lab 6: SchedulingZubair Nabi
 
AOS Lab 5: System calls
AOS Lab 5: System callsAOS Lab 5: System calls
AOS Lab 5: System callsZubair Nabi
 
AOS Lab 4: If you liked it, then you should have put a “lock” on it
AOS Lab 4: If you liked it, then you should have put a “lock” on itAOS Lab 4: If you liked it, then you should have put a “lock” on it
AOS Lab 4: If you liked it, then you should have put a “lock” on itZubair Nabi
 
AOS Lab 1: Hello, Linux!
AOS Lab 1: Hello, Linux!AOS Lab 1: Hello, Linux!
AOS Lab 1: Hello, Linux!Zubair Nabi
 
AOS Lab 2: Hello, xv6!
AOS Lab 2: Hello, xv6!AOS Lab 2: Hello, xv6!
AOS Lab 2: Hello, xv6!Zubair Nabi
 
AOS Lab 1: Hello, Linux!
AOS Lab 1: Hello, Linux!AOS Lab 1: Hello, Linux!
AOS Lab 1: Hello, Linux!Zubair Nabi
 
The Big Data Stack
The Big Data StackThe Big Data Stack
The Big Data StackZubair Nabi
 
Raabta: Low-cost Video Conferencing for the Developing World
Raabta: Low-cost Video Conferencing for the Developing WorldRaabta: Low-cost Video Conferencing for the Developing World
Raabta: Low-cost Video Conferencing for the Developing WorldZubair Nabi
 
The Anatomy of Web Censorship in Pakistan
The Anatomy of Web Censorship in PakistanThe Anatomy of Web Censorship in Pakistan
The Anatomy of Web Censorship in PakistanZubair Nabi
 
MapReduce and DBMS Hybrids
MapReduce and DBMS HybridsMapReduce and DBMS Hybrids
MapReduce and DBMS HybridsZubair Nabi
 
MapReduce Application Scripting
MapReduce Application ScriptingMapReduce Application Scripting
MapReduce Application ScriptingZubair Nabi
 
Topic 15: Datacenter Design and Networking
Topic 15: Datacenter Design and NetworkingTopic 15: Datacenter Design and Networking
Topic 15: Datacenter Design and NetworkingZubair Nabi
 
Topic 14: Operating Systems and Virtualization
Topic 14: Operating Systems and VirtualizationTopic 14: Operating Systems and Virtualization
Topic 14: Operating Systems and VirtualizationZubair Nabi
 
Topic 13: Cloud Stacks
Topic 13: Cloud StacksTopic 13: Cloud Stacks
Topic 13: Cloud StacksZubair Nabi
 
Lab 5: Interconnecting a Datacenter using Mininet
Lab 5: Interconnecting a Datacenter using MininetLab 5: Interconnecting a Datacenter using Mininet
Lab 5: Interconnecting a Datacenter using MininetZubair Nabi
 

More from Zubair Nabi (20)

AOS Lab 12: Network Communication
AOS Lab 12: Network CommunicationAOS Lab 12: Network Communication
AOS Lab 12: Network Communication
 
AOS Lab 11: Virtualization
AOS Lab 11: VirtualizationAOS Lab 11: Virtualization
AOS Lab 11: Virtualization
 
AOS Lab 10: File system -- Inodes and beyond
AOS Lab 10: File system -- Inodes and beyondAOS Lab 10: File system -- Inodes and beyond
AOS Lab 10: File system -- Inodes and beyond
 
AOS Lab 8: Interrupts and Device Drivers
AOS Lab 8: Interrupts and Device DriversAOS Lab 8: Interrupts and Device Drivers
AOS Lab 8: Interrupts and Device Drivers
 
AOS Lab 7: Page tables
AOS Lab 7: Page tablesAOS Lab 7: Page tables
AOS Lab 7: Page tables
 
AOS Lab 6: Scheduling
AOS Lab 6: SchedulingAOS Lab 6: Scheduling
AOS Lab 6: Scheduling
 
AOS Lab 5: System calls
AOS Lab 5: System callsAOS Lab 5: System calls
AOS Lab 5: System calls
 
AOS Lab 4: If you liked it, then you should have put a “lock” on it
AOS Lab 4: If you liked it, then you should have put a “lock” on itAOS Lab 4: If you liked it, then you should have put a “lock” on it
AOS Lab 4: If you liked it, then you should have put a “lock” on it
 
AOS Lab 1: Hello, Linux!
AOS Lab 1: Hello, Linux!AOS Lab 1: Hello, Linux!
AOS Lab 1: Hello, Linux!
 
AOS Lab 2: Hello, xv6!
AOS Lab 2: Hello, xv6!AOS Lab 2: Hello, xv6!
AOS Lab 2: Hello, xv6!
 
AOS Lab 1: Hello, Linux!
AOS Lab 1: Hello, Linux!AOS Lab 1: Hello, Linux!
AOS Lab 1: Hello, Linux!
 
The Big Data Stack
The Big Data StackThe Big Data Stack
The Big Data Stack
 
Raabta: Low-cost Video Conferencing for the Developing World
Raabta: Low-cost Video Conferencing for the Developing WorldRaabta: Low-cost Video Conferencing for the Developing World
Raabta: Low-cost Video Conferencing for the Developing World
 
The Anatomy of Web Censorship in Pakistan
The Anatomy of Web Censorship in PakistanThe Anatomy of Web Censorship in Pakistan
The Anatomy of Web Censorship in Pakistan
 
MapReduce and DBMS Hybrids
MapReduce and DBMS HybridsMapReduce and DBMS Hybrids
MapReduce and DBMS Hybrids
 
MapReduce Application Scripting
MapReduce Application ScriptingMapReduce Application Scripting
MapReduce Application Scripting
 
Topic 15: Datacenter Design and Networking
Topic 15: Datacenter Design and NetworkingTopic 15: Datacenter Design and Networking
Topic 15: Datacenter Design and Networking
 
Topic 14: Operating Systems and Virtualization
Topic 14: Operating Systems and VirtualizationTopic 14: Operating Systems and Virtualization
Topic 14: Operating Systems and Virtualization
 
Topic 13: Cloud Stacks
Topic 13: Cloud StacksTopic 13: Cloud Stacks
Topic 13: Cloud Stacks
 
Lab 5: Interconnecting a Datacenter using Mininet
Lab 5: Interconnecting a Datacenter using MininetLab 5: Interconnecting a Datacenter using Mininet
Lab 5: Interconnecting a Datacenter using Mininet
 

Recently uploaded

Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraDeakin University
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxnull - The Open Security Community
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?XfilesPro
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your BudgetHyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your BudgetEnjoy Anytime
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 

Recently uploaded (20)

Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning era
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptxVulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your BudgetHyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 

Topic 3: Large-scale Distributed Systems

  • 1. 3: Large-scale Distributed Systems Zubair Nabi zubair.nabi@itu.edu.pk April 17, 2013 Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 1 / 29
  • 2. Outline 1 Introduction 2 Client-server Interaction 3 Characteristics 4 Message Passing Interface Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 2 / 29
  • 3. Outline 1 Introduction 2 Client-server Interaction 3 Characteristics 4 Message Passing Interface Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 3 / 29
  • 4. Distributed Systems Set of discrete machines which cooperate to perform computation Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 4 / 29
  • 5. Distributed Systems Set of discrete machines which cooperate to perform computation Give the notion of a single “machine” Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 4 / 29
  • 6. Distributed Systems Set of discrete machines which cooperate to perform computation Give the notion of a single “machine” Examples: Compute clusters Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 4 / 29
  • 7. Distributed Systems Set of discrete machines which cooperate to perform computation Give the notion of a single “machine” Examples: Compute clusters Distributed storage systems, such as Dropbox, Google Drive, etc. Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 4 / 29
  • 8. Distributed Systems Set of discrete machines which cooperate to perform computation Give the notion of a single “machine” Examples: Compute clusters Distributed storage systems, such as Dropbox, Google Drive, etc. The Web Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 4 / 29
  • 9. Advantages Scalability: The scale of the Internet (think how many queries Google servers handle daily) Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 5 / 29
  • 10. Advantages Scalability: The scale of the Internet (think how many queries Google servers handle daily) Only a matter of adding more machines Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 5 / 29
  • 11. Advantages Scalability: The scale of the Internet (think how many queries Google servers handle daily) Only a matter of adding more machines Cheaper than super computers Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 5 / 29
  • 12. Advantages Scalability: The scale of the Internet (think how many queries Google servers handle daily) Only a matter of adding more machines Cheaper than super computers More machines means more parallelism, hence better performance Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 5 / 29
  • 13. Advantages Scalability: The scale of the Internet (think how many queries Google servers handle daily) Only a matter of adding more machines Cheaper than super computers More machines means more parallelism, hence better performance Sharing: The same resource is shared between multiple users Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 5 / 29
  • 14. Advantages Scalability: The scale of the Internet (think how many queries Google servers handle daily) Only a matter of adding more machines Cheaper than super computers More machines means more parallelism, hence better performance Sharing: The same resource is shared between multiple users Just like the Internet is shared between millions of users Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 5 / 29
  • 15. Advantages Scalability: The scale of the Internet (think how many queries Google servers handle daily) Only a matter of adding more machines Cheaper than super computers More machines means more parallelism, hence better performance Sharing: The same resource is shared between multiple users Just like the Internet is shared between millions of users Communication: Communication between (potentially geographically isolated) machines and users (via email, Facebook, etc.) Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 5 / 29
  • 16. Advantages Scalability: The scale of the Internet (think how many queries Google servers handle daily) Only a matter of adding more machines Cheaper than super computers More machines means more parallelism, hence better performance Sharing: The same resource is shared between multiple users Just like the Internet is shared between millions of users Communication: Communication between (potentially geographically isolated) machines and users (via email, Facebook, etc.) Reliability: The service can remain active even if multiple machines go down Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 5 / 29
  • 17. Challenges Concurrency: Concurrent execution requires some form of coordination Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 6 / 29
  • 18. Challenges Concurrency: Concurrent execution requires some form of coordination Fault-tolerance: Any component can fail at any instant due to a software or a hardware bug Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 6 / 29
  • 19. Challenges Concurrency: Concurrent execution requires some form of coordination Fault-tolerance: Any component can fail at any instant due to a software or a hardware bug Security: One machine can compromise the entire system Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 6 / 29
  • 20. Challenges Concurrency: Concurrent execution requires some form of coordination Fault-tolerance: Any component can fail at any instant due to a software or a hardware bug Security: One machine can compromise the entire system Coordination: No global time so non-trivial to coordinate Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 6 / 29
  • 21. Challenges Concurrency: Concurrent execution requires some form of coordination Fault-tolerance: Any component can fail at any instant due to a software or a hardware bug Security: One machine can compromise the entire system Coordination: No global time so non-trivial to coordinate Trouble shooting: Hard to trouble shoot because hard to reason about the system Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 6 / 29
  • 22. Transparency Distributed systems give the notion of a single machine or keep the distribution transparent Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 7 / 29
  • 23. Transparency Distributed systems give the notion of a single machine or keep the distribution transparent The degree of this transparency can be mapped onto an entire spectrum of options for both users and programmers Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 7 / 29
  • 24. Transparency Distributed systems give the notion of a single machine or keep the distribution transparent The degree of this transparency can be mapped onto an entire spectrum of options for both users and programmers For instance: A web user is aware of network communication but the number of accessed machines is transparent Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 7 / 29
  • 25. Transparency Distributed systems give the notion of a single machine or keep the distribution transparent The degree of this transparency can be mapped onto an entire spectrum of options for both users and programmers For instance: A web user is aware of network communication but the number of accessed machines is transparent Transparency can be ensured by middleware that adds a layer of abstraction Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 7 / 29
  • 26. Transparency Distributed systems give the notion of a single machine or keep the distribution transparent The degree of this transparency can be mapped onto an entire spectrum of options for both users and programmers For instance: A web user is aware of network communication but the number of accessed machines is transparent Transparency can be ensured by middleware that adds a layer of abstraction Can span access, concurrency, failure, location, migration, persistence, relocation, replication Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 7 / 29
  • 27. Outline 1 Introduction 2 Client-server Interaction 3 Characteristics 4 Message Passing Interface Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 8 / 29
  • 28. Request-reply protocol Standard operation 1 Client sends request to the server Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 9 / 29
  • 29. Request-reply protocol Standard operation 1 Client sends request to the server 2 Server processes the request and sends a corresponding response Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 9 / 29
  • 30. Request-reply protocol Standard operation 1 Client sends request to the server 2 Server processes the request and sends a corresponding response In the synchronous model, the client blocks till the response is received Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 9 / 29
  • 31. Request-reply protocol Standard operation 1 Client sends request to the server 2 Server processes the request and sends a corresponding response In the synchronous model, the client blocks till the response is received In case of the asynchronous model, the client continues its execution Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 9 / 29
  • 32. Request-reply protocol Standard operation 1 Client sends request to the server 2 Server processes the request and sends a corresponding response In the synchronous model, the client blocks till the response is received In case of the asynchronous model, the client continues its execution For instance: HTTP 1.0 Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 9 / 29
  • 33. Request-reply protocol Standard operation 1 Client sends request to the server 2 Server processes the request and sends a corresponding response In the synchronous model, the client blocks till the response is received In case of the asynchronous model, the client continues its execution For instance: HTTP 1.0 1 Client sends GET /index.html Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 9 / 29
  • 34. Request-reply protocol Standard operation 1 Client sends request to the server 2 Server processes the request and sends a corresponding response In the synchronous model, the client blocks till the response is received In case of the asynchronous model, the client continues its execution For instance: HTTP 1.0 1 Client sends GET /index.html 2 Server responds with index.html Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 9 / 29
  • 35. Request-reply protocol Standard operation 1 Client sends request to the server 2 Server processes the request and sends a corresponding response In the synchronous model, the client blocks till the response is received In case of the asynchronous model, the client continues its execution For instance: HTTP 1.0 1 Client sends GET /index.html 2 Server responds with index.html 3 Client renders index.html Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 9 / 29
  • 36. Errors and failures Errors are handled at the application-level Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 10 / 29
  • 37. Errors and failures Errors are handled at the application-level For instance, if the client requests a non-existent web page just return a special reply: 404 Not Found Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 10 / 29
  • 38. Errors and failures Errors are handled at the application-level For instance, if the client requests a non-existent web page just return a special reply: 404 Not Found Failures are system-level things Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 10 / 29
  • 39. Errors and failures Errors are handled at the application-level For instance, if the client requests a non-existent web page just return a special reply: 404 Not Found Failures are system-level things For instance, lost message, client/server crash, etc. Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 10 / 29
  • 40. Errors and failures Errors are handled at the application-level For instance, if the client requests a non-existent web page just return a special reply: 404 Not Found Failures are system-level things For instance, lost message, client/server crash, etc. To handle failure, the client must timeout after T Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 10 / 29
  • 41. Errors and failures Errors are handled at the application-level For instance, if the client requests a non-existent web page just return a special reply: 404 Not Found Failures are system-level things For instance, lost message, client/server crash, etc. To handle failure, the client must timeout after T The client can retry on a timeout Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 10 / 29
  • 42. Errors and failures Errors are handled at the application-level For instance, if the client requests a non-existent web page just return a special reply: 404 Not Found Failures are system-level things For instance, lost message, client/server crash, etc. To handle failure, the client must timeout after T The client can retry on a timeout Setting value of T is system-specific Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 10 / 29
  • 43. Remote Procedure Call Request/response protocols are widely used but too low level Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 11 / 29
  • 44. Remote Procedure Call Request/response protocols are widely used but too low level Need to define each request separately including their network message representation Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 11 / 29
  • 45. Remote Procedure Call Request/response protocols are widely used but too low level Need to define each request separately including their network message representation Remote procedure call (RPC) presents a simpler abstraction Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 11 / 29
  • 46. Remote Procedure Call Request/response protocols are widely used but too low level Need to define each request separately including their network message representation Remote procedure call (RPC) presents a simpler abstraction Programmer invokes a procedure which executes on a remote machine (the server) Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 11 / 29
  • 47. Remote Procedure Call Request/response protocols are widely used but too low level Need to define each request separately including their network message representation Remote procedure call (RPC) presents a simpler abstraction Programmer invokes a procedure which executes on a remote machine (the server) RPC subsystem takes care of message formats, communication, timeouts, etc. Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 11 / 29
  • 48. Remote Procedure Call Request/response protocols are widely used but too low level Need to define each request separately including their network message representation Remote procedure call (RPC) presents a simpler abstraction Programmer invokes a procedure which executes on a remote machine (the server) RPC subsystem takes care of message formats, communication, timeouts, etc. Distribution of the system becomes transparent Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 11 / 29
  • 49. Remote Procedure Call Request/response protocols are widely used but too low level Need to define each request separately including their network message representation Remote procedure call (RPC) presents a simpler abstraction Programmer invokes a procedure which executes on a remote machine (the server) RPC subsystem takes care of message formats, communication, timeouts, etc. Distribution of the system becomes transparent Integrated with the programming language Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 11 / 29
  • 50. Remote Procedure Call Request/response protocols are widely used but too low level Need to define each request separately including their network message representation Remote procedure call (RPC) presents a simpler abstraction Programmer invokes a procedure which executes on a remote machine (the server) RPC subsystem takes care of message formats, communication, timeouts, etc. Distribution of the system becomes transparent Integrated with the programming language RPC layer adds stubs at client end which when invoked execute a method at the server Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 11 / 29
  • 51. Example: XML-RPC XML is used to encode method invocations (method names, parameters, etc.) Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 12 / 29
  • 52. Example: XML-RPC XML is used to encode method invocations (method names, parameters, etc.) HTTP POST used to send request and receive response (also encoded in XML) Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 12 / 29
  • 53. Example: XML-RPC XML is used to encode method invocations (method names, parameters, etc.) HTTP POST used to send request and receive response (also encoded in XML) Looks like a regular web session on wire so plays well with middleboxes Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 12 / 29
  • 54. Example: XML-RPC XML is used to encode method invocations (method names, parameters, etc.) HTTP POST used to send request and receive response (also encoded in XML) Looks like a regular web session on wire so plays well with middleboxes Language agnostic and extensible Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 12 / 29
  • 55. Example: XML-RPC XML is used to encode method invocations (method names, parameters, etc.) HTTP POST used to send request and receive response (also encoded in XML) Looks like a regular web session on wire so plays well with middleboxes Language agnostic and extensible Extended with more features (namespaces, user-defined types, etc.) and diverse transports (TCP, UDP, etc.) to result in Simple Object Access Protocol (SOAP) Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 12 / 29
  • 56. RPC shortcomings RPC mechanisms are synchronous Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 13 / 29
  • 57. RPC shortcomings RPC mechanisms are synchronous Client blocks till response is received Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 13 / 29
  • 58. RPC shortcomings RPC mechanisms are synchronous Client blocks till response is received Poor responsiveness, especially in high latency networks Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 13 / 29
  • 59. RPC shortcomings RPC mechanisms are synchronous Client blocks till response is received Poor responsiveness, especially in high latency networks 2006 ushered in the age of Asynchronous JavaScript with XML (AJAX) Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 13 / 29
  • 60. RPC shortcomings RPC mechanisms are synchronous Client blocks till response is received Poor responsiveness, especially in high latency networks 2006 ushered in the age of Asynchronous JavaScript with XML (AJAX) Update web page without reloading Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 13 / 29
  • 61. RPC shortcomings RPC mechanisms are synchronous Client blocks till response is received Poor responsiveness, especially in high latency networks 2006 ushered in the age of Asynchronous JavaScript with XML (AJAX) Update web page without reloading For instance, Google Maps, Gmail, etc. Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 13 / 29
  • 62. Representational State Transfer AJAX still revolves around RPC (just asynchronously) Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 14 / 29
  • 63. Representational State Transfer AJAX still revolves around RPC (just asynchronously) Representational State Transfer (REST) offers an alternative All resources have a name: URL or URI Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 14 / 29
  • 64. Representational State Transfer AJAX still revolves around RPC (just asynchronously) Representational State Transfer (REST) offers an alternative All resources have a name: URL or URI Resources are manipulated with PUT, GET, POST, and DELETE methods Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 14 / 29
  • 65. Representational State Transfer AJAX still revolves around RPC (just asynchronously) Representational State Transfer (REST) offers an alternative All resources have a name: URL or URI Resources are manipulated with PUT, GET, POST, and DELETE methods State is sent along with operations Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 14 / 29
  • 66. Representational State Transfer AJAX still revolves around RPC (just asynchronously) Representational State Transfer (REST) offers an alternative All resources have a name: URL or URI Resources are manipulated with PUT, GET, POST, and DELETE methods State is sent along with operations Widely used these days (For instance, by Amazon, Twitter, etc.) Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 14 / 29
  • 67. Outline 1 Introduction 2 Client-server Interaction 3 Characteristics 4 Message Passing Interface Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 15 / 29
  • 68. Clocks Distributed systems need to be able to: Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 16 / 29
  • 69. Clocks Distributed systems need to be able to: Order events produced by concurrent processes Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 16 / 29
  • 70. Clocks Distributed systems need to be able to: Order events produced by concurrent processes Synchronize senders and receivers of messages Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 16 / 29
  • 71. Clocks Distributed systems need to be able to: Order events produced by concurrent processes Synchronize senders and receivers of messages Serialize concurrent accesses to shared objects Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 16 / 29
  • 72. Clocks Distributed systems need to be able to: Order events produced by concurrent processes Synchronize senders and receivers of messages Serialize concurrent accesses to shared objects Coordinate joint activity Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 16 / 29
  • 73. Clocks Distributed systems need to be able to: Order events produced by concurrent processes Synchronize senders and receivers of messages Serialize concurrent accesses to shared objects Coordinate joint activity Clocks are employed for this Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 16 / 29
  • 74. Clocks Distributed systems need to be able to: Order events produced by concurrent processes Synchronize senders and receivers of messages Serialize concurrent accesses to shared objects Coordinate joint activity Clocks are employed for this But quartz oscillators oscillate at slightly different frequencies leading to clock drift and resulting in clock skew between clocks Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 16 / 29
  • 75. Clock synchronization Clock synchronization algorithms try to minimize skew between a set of clocks Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 17 / 29
  • 76. Clock synchronization Clock synchronization algorithms try to minimize skew between a set of clocks Decide upon a correct time Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 17 / 29
  • 77. Clock synchronization Clock synchronization algorithms try to minimize skew between a set of clocks Decide upon a correct time Communicate to agree (compensating for delays) Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 17 / 29
  • 78. Clock synchronization Clock synchronization algorithms try to minimize skew between a set of clocks Decide upon a correct time Communicate to agree (compensating for delays) Possibly multiple servers involved Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 17 / 29
  • 79. Clock synchronization Clock synchronization algorithms try to minimize skew between a set of clocks Decide upon a correct time Communicate to agree (compensating for delays) Possibly multiple servers involved In reality, still a 1-10ms skew after sync (but we can live with that) Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 17 / 29
  • 80. Ordering Time is used to ensure ordering Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 18 / 29
  • 81. Ordering Time is used to ensure ordering Withdraw money at 23:59.45 Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 18 / 29
  • 82. Ordering Time is used to ensure ordering Withdraw money at 23:59.45 Bank calculates interest at 00:00.0 Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 18 / 29
  • 83. Ordering Time is used to ensure ordering Withdraw money at 23:59.45 Bank calculates interest at 00:00.0 The withdraw money should not be included in the interest calculation Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 18 / 29
  • 84. Ordering Time is used to ensure ordering Withdraw money at 23:59.45 Bank calculates interest at 00:00.0 The withdraw money should not be included in the interest calculation In most cases, only need to know that a happened before b, known as the happens-before relation Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 18 / 29
  • 85. Ordering Time is used to ensure ordering Withdraw money at 23:59.45 Bank calculates interest at 00:00.0 The withdraw money should not be included in the interest calculation In most cases, only need to know that a happened before b, known as the happens-before relation Multiple algorithms exists to ensure the happens-before relation Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 18 / 29
  • 86. Distributed Mutual Exclusion Concurrent access to shared resources needs to be synchronized Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 19 / 29
  • 87. Distributed Mutual Exclusion Concurrent access to shared resources needs to be synchronized Need hardware support on local machine Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 19 / 29
  • 88. Distributed Mutual Exclusion Concurrent access to shared resources needs to be synchronized Need hardware support on local machine Locks, semaphores, etc. Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 19 / 29
  • 89. Distributed Mutual Exclusion Concurrent access to shared resources needs to be synchronized Need hardware support on local machine Locks, semaphores, etc. But this support is not available across a distributed system Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 19 / 29
  • 90. Distributed Mutual Exclusion (2) Multiple methods exist to ensure this: Central lock server: All lock requests are handled by a central server Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 20 / 29
  • 91. Distributed Mutual Exclusion (2) Multiple methods exist to ensure this: Central lock server: All lock requests are handled by a central server Token passing: Arrange nodes into a ring and a token is passed around Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 20 / 29
  • 92. Distributed Mutual Exclusion (2) Multiple methods exist to ensure this: Central lock server: All lock requests are handled by a central server Token passing: Arrange nodes into a ring and a token is passed around Totally-ordered multicast: Clients multicast requests to each other Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 20 / 29
  • 93. Consensus Getting processes in a distributed system to agree on something Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 21 / 29
  • 94. Consensus Getting processes in a distributed system to agree on something Requirements for correct solution Agreement: All nodes arrive at the same answer Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 21 / 29
  • 95. Consensus Getting processes in a distributed system to agree on something Requirements for correct solution Agreement: All nodes arrive at the same answer Validity: Answer is one that was proposed by someone Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 21 / 29
  • 96. Consensus Getting processes in a distributed system to agree on something Requirements for correct solution Agreement: All nodes arrive at the same answer Validity: Answer is one that was proposed by someone Termination: All nodes eventually decide Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 21 / 29
  • 97. Distributed transactions Composite operations (i.e. A collection of reads and updates to a set of objects) Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 22 / 29
  • 98. Distributed transactions Composite operations (i.e. A collection of reads and updates to a set of objects) A transaction is atomic Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 22 / 29
  • 99. Distributed transactions Composite operations (i.e. A collection of reads and updates to a set of objects) A transaction is atomic If it commits, all operations are applied Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 22 / 29
  • 100. Distributed transactions Composite operations (i.e. A collection of reads and updates to a set of objects) A transaction is atomic If it commits, all operations are applied If it aborts, no state mutation at all Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 22 / 29
  • 101. Distributed transactions Composite operations (i.e. A collection of reads and updates to a set of objects) A transaction is atomic If it commits, all operations are applied If it aborts, no state mutation at all Distributed transactions span multiple transaction processing servers Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 22 / 29
  • 102. Distributed transactions Composite operations (i.e. A collection of reads and updates to a set of objects) A transaction is atomic If it commits, all operations are applied If it aborts, no state mutation at all Distributed transactions span multiple transaction processing servers For instance, booking flights: Lahore -> Dubai -> New York Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 22 / 29
  • 103. Distributed transactions Composite operations (i.e. A collection of reads and updates to a set of objects) A transaction is atomic If it commits, all operations are applied If it aborts, no state mutation at all Distributed transactions span multiple transaction processing servers For instance, booking flights: Lahore -> Dubai -> New York Need to book entire trip Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 22 / 29
  • 104. Distributed transactions Composite operations (i.e. A collection of reads and updates to a set of objects) A transaction is atomic If it commits, all operations are applied If it aborts, no state mutation at all Distributed transactions span multiple transaction processing servers For instance, booking flights: Lahore -> Dubai -> New York Need to book entire trip Actions need to be coordinated across multiple parties Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 22 / 29
  • 105. Replication A number of distributed systems involve replication Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 23 / 29
  • 106. Replication A number of distributed systems involve replication Data replication: Multiple copies of some object stored at different servers Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 23 / 29
  • 107. Replication A number of distributed systems involve replication Data replication: Multiple copies of some object stored at different servers Computation replication: Multiple servers capable of providing an operation Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 23 / 29
  • 108. Replication A number of distributed systems involve replication Data replication: Multiple copies of some object stored at different servers Computation replication: Multiple servers capable of providing an operation Advantages: 1 Load balancing: Work spread out across clients Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 23 / 29
  • 109. Replication A number of distributed systems involve replication Data replication: Multiple copies of some object stored at different servers Computation replication: Multiple servers capable of providing an operation Advantages: 1 Load balancing: Work spread out across clients 2 Lower latency: Better performance if replica close to the client Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 23 / 29
  • 110. Replication A number of distributed systems involve replication Data replication: Multiple copies of some object stored at different servers Computation replication: Multiple servers capable of providing an operation Advantages: 1 Load balancing: Work spread out across clients 2 Lower latency: Better performance if replica close to the client 3 Fault tolerance: Failure of some replicas can be tolerated Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 23 / 29
  • 111. Replication A number of distributed systems involve replication Data replication: Multiple copies of some object stored at different servers Computation replication: Multiple servers capable of providing an operation Advantages: 1 Load balancing: Work spread out across clients 2 Lower latency: Better performance if replica close to the client 3 Fault tolerance: Failure of some replicas can be tolerated Examples: DNS, content distribution networks, database replication, etc. Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 23 / 29
  • 112. CAP CAP: 1 Consistency: All nodes see the same state Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 24 / 29
  • 113. CAP CAP: 1 Consistency: All nodes see the same state 2 Availability: All requests get a response Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 24 / 29
  • 114. CAP CAP: 1 Consistency: All nodes see the same state 2 Availability: All requests get a response 3 Partitioning: System continues to operate even in the face of node failure Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 24 / 29
  • 115. CAP CAP: 1 Consistency: All nodes see the same state 2 Availability: All requests get a response 3 Partitioning: System continues to operate even in the face of node failure Brewer’s conjecture states that in a distributed system only 2 out of 3 possible Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 24 / 29
  • 116. CAP CAP: 1 Consistency: All nodes see the same state 2 Availability: All requests get a response 3 Partitioning: System continues to operate even in the face of node failure Brewer’s conjecture states that in a distributed system only 2 out of 3 possible In the current setup, partitioning is a given: Hardware/software fails all the time Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 24 / 29
  • 117. CAP CAP: 1 Consistency: All nodes see the same state 2 Availability: All requests get a response 3 Partitioning: System continues to operate even in the face of node failure Brewer’s conjecture states that in a distributed system only 2 out of 3 possible In the current setup, partitioning is a given: Hardware/software fails all the time Therefore, systems need to choose between consistency and availability Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 24 / 29
  • 118. References George Coulouris, Jean Dollimore, Tim Kindberg, and Gordon Blair. 2011. Distributed Systems: Concepts and Design (5th ed.). Addison-Wesley Publishing Company, USA. Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 25 / 29