3: Large-scale Distributed Systems                                   Zubair Nabi                        zubair.nabi@itu.ed...
Outline  1    Introduction  2    Client-server Interaction  3    Characteristics  4    Message Passing Interface  Zubair N...
Outline  1    Introduction  2    Client-server Interaction  3    Characteristics  4    Message Passing Interface  Zubair N...
Distributed Systems          Set of discrete machines which cooperate to perform computation  Zubair Nabi              3: ...
Distributed Systems          Set of discrete machines which cooperate to perform computation          Give the notion of a...
Distributed Systems          Set of discrete machines which cooperate to perform computation          Give the notion of a...
Distributed Systems          Set of discrete machines which cooperate to perform computation          Give the notion of a...
Distributed Systems          Set of discrete machines which cooperate to perform computation          Give the notion of a...
Advantages          Scalability:                The scale of the Internet (think how many queries Google servers          ...
Advantages          Scalability:                The scale of the Internet (think how many queries Google servers          ...
Advantages          Scalability:                The scale of the Internet (think how many queries Google servers          ...
Advantages          Scalability:                The scale of the Internet (think how many queries Google servers          ...
Advantages          Scalability:                The scale of the Internet (think how many queries Google servers          ...
Advantages          Scalability:                The scale of the Internet (think how many queries Google servers          ...
Advantages          Scalability:                The scale of the Internet (think how many queries Google servers          ...
Advantages          Scalability:                The scale of the Internet (think how many queries Google servers          ...
Challenges          Concurrency:                Concurrent execution requires some form of coordination  Zubair Nabi      ...
Challenges          Concurrency:                Concurrent execution requires some form of coordination          Fault-tol...
Challenges          Concurrency:                Concurrent execution requires some form of coordination          Fault-tol...
Challenges          Concurrency:                Concurrent execution requires some form of coordination          Fault-tol...
Challenges          Concurrency:                Concurrent execution requires some form of coordination          Fault-tol...
Transparency          Distributed systems give the notion of a single machine or keep the          distribution transparen...
Transparency          Distributed systems give the notion of a single machine or keep the          distribution transparen...
Transparency          Distributed systems give the notion of a single machine or keep the          distribution transparen...
Transparency          Distributed systems give the notion of a single machine or keep the          distribution transparen...
Transparency          Distributed systems give the notion of a single machine or keep the          distribution transparen...
Outline  1    Introduction  2    Client-server Interaction  3    Characteristics  4    Message Passing Interface  Zubair N...
Request-reply protocol          Standard operation                1   Client sends request to the server  Zubair Nabi     ...
Request-reply protocol          Standard operation                1   Client sends request to the server                2 ...
Request-reply protocol          Standard operation                1   Client sends request to the server                2 ...
Request-reply protocol          Standard operation                1   Client sends request to the server                2 ...
Request-reply protocol          Standard operation                1   Client sends request to the server                2 ...
Request-reply protocol          Standard operation                1   Client sends request to the server                2 ...
Request-reply protocol          Standard operation                1   Client sends request to the server                2 ...
Request-reply protocol          Standard operation                1   Client sends request to the server                2 ...
Errors and failures          Errors are handled at the application-level  Zubair Nabi               3: Large-scale Distrib...
Errors and failures          Errors are handled at the application-level                For instance, if the client reques...
Errors and failures          Errors are handled at the application-level                For instance, if the client reques...
Errors and failures          Errors are handled at the application-level                For instance, if the client reques...
Errors and failures          Errors are handled at the application-level                For instance, if the client reques...
Errors and failures          Errors are handled at the application-level                For instance, if the client reques...
Errors and failures          Errors are handled at the application-level                For instance, if the client reques...
Remote Procedure Call          Request/response protocols are widely used but too low level  Zubair Nabi              3: L...
Remote Procedure Call          Request/response protocols are widely used but too low level                Need to define e...
Remote Procedure Call          Request/response protocols are widely used but too low level                Need to define e...
Remote Procedure Call          Request/response protocols are widely used but too low level                Need to define e...
Remote Procedure Call          Request/response protocols are widely used but too low level                Need to define e...
Remote Procedure Call          Request/response protocols are widely used but too low level                Need to define e...
Remote Procedure Call          Request/response protocols are widely used but too low level                Need to define e...
Remote Procedure Call          Request/response protocols are widely used but too low level                Need to define e...
Example: XML-RPC         XML is used to encode method invocations (method names,         parameters, etc.) Zubair Nabi    ...
Example: XML-RPC         XML is used to encode method invocations (method names,         parameters, etc.)         HTTP PO...
Example: XML-RPC         XML is used to encode method invocations (method names,         parameters, etc.)         HTTP PO...
Example: XML-RPC         XML is used to encode method invocations (method names,         parameters, etc.)         HTTP PO...
Example: XML-RPC         XML is used to encode method invocations (method names,         parameters, etc.)         HTTP PO...
RPC shortcomings          RPC mechanisms are synchronous  Zubair Nabi            3: Large-scale Distributed Systems   Apri...
RPC shortcomings          RPC mechanisms are synchronous                Client blocks till response is received  Zubair Na...
RPC shortcomings          RPC mechanisms are synchronous                Client blocks till response is received           ...
RPC shortcomings          RPC mechanisms are synchronous                Client blocks till response is received           ...
RPC shortcomings          RPC mechanisms are synchronous                Client blocks till response is received           ...
RPC shortcomings          RPC mechanisms are synchronous                Client blocks till response is received           ...
Representational State Transfer          AJAX still revolves around RPC (just asynchronously)  Zubair Nabi              3:...
Representational State Transfer          AJAX still revolves around RPC (just asynchronously)          Representational St...
Representational State Transfer          AJAX still revolves around RPC (just asynchronously)          Representational St...
Representational State Transfer          AJAX still revolves around RPC (just asynchronously)          Representational St...
Representational State Transfer          AJAX still revolves around RPC (just asynchronously)          Representational St...
Outline  1    Introduction  2    Client-server Interaction  3    Characteristics  4    Message Passing Interface  Zubair N...
Clocks          Distributed systems need to be able to:  Zubair Nabi              3: Large-scale Distributed Systems   Apr...
Clocks          Distributed systems need to be able to:                Order events produced by concurrent processes  Zuba...
Clocks          Distributed systems need to be able to:                Order events produced by concurrent processes      ...
Clocks          Distributed systems need to be able to:                Order events produced by concurrent processes      ...
Clocks          Distributed systems need to be able to:                Order events produced by concurrent processes      ...
Clocks          Distributed systems need to be able to:                Order events produced by concurrent processes      ...
Clocks          Distributed systems need to be able to:                Order events produced by concurrent processes      ...
Clock synchronization          Clock synchronization algorithms try to minimize skew between a set of          clocks  Zub...
Clock synchronization          Clock synchronization algorithms try to minimize skew between a set of          clocks     ...
Clock synchronization          Clock synchronization algorithms try to minimize skew between a set of          clocks     ...
Clock synchronization          Clock synchronization algorithms try to minimize skew between a set of          clocks     ...
Clock synchronization          Clock synchronization algorithms try to minimize skew between a set of          clocks     ...
Ordering          Time is used to ensure ordering  Zubair Nabi              3: Large-scale Distributed Systems   April 17,...
Ordering          Time is used to ensure ordering                Withdraw money at 23:59.45  Zubair Nabi               3: ...
Ordering          Time is used to ensure ordering                Withdraw money at 23:59.45                Bank calculates...
Ordering          Time is used to ensure ordering                Withdraw money at 23:59.45                Bank calculates...
Ordering          Time is used to ensure ordering                Withdraw money at 23:59.45                Bank calculates...
Ordering          Time is used to ensure ordering                Withdraw money at 23:59.45                Bank calculates...
Distributed Mutual Exclusion          Concurrent access to shared resources needs to be synchronized  Zubair Nabi         ...
Distributed Mutual Exclusion          Concurrent access to shared resources needs to be synchronized          Need hardwar...
Distributed Mutual Exclusion          Concurrent access to shared resources needs to be synchronized          Need hardwar...
Distributed Mutual Exclusion          Concurrent access to shared resources needs to be synchronized          Need hardwar...
Distributed Mutual Exclusion (2)  Multiple methods exist to ensure this:          Central lock server: All lock requests a...
Distributed Mutual Exclusion (2)  Multiple methods exist to ensure this:          Central lock server: All lock requests a...
Distributed Mutual Exclusion (2)  Multiple methods exist to ensure this:          Central lock server: All lock requests a...
Consensus         Getting processes in a distributed system to agree on something Zubair Nabi              3: Large-scale ...
Consensus         Getting processes in a distributed system to agree on something         Requirements for correct solutio...
Consensus         Getting processes in a distributed system to agree on something         Requirements for correct solutio...
Consensus         Getting processes in a distributed system to agree on something         Requirements for correct solutio...
Distributed transactions          Composite operations (i.e. A collection of reads and updates to a set of          object...
Distributed transactions          Composite operations (i.e. A collection of reads and updates to a set of          object...
Distributed transactions          Composite operations (i.e. A collection of reads and updates to a set of          object...
Distributed transactions          Composite operations (i.e. A collection of reads and updates to a set of          object...
Distributed transactions          Composite operations (i.e. A collection of reads and updates to a set of          object...
Distributed transactions          Composite operations (i.e. A collection of reads and updates to a set of          object...
Distributed transactions          Composite operations (i.e. A collection of reads and updates to a set of          object...
Distributed transactions          Composite operations (i.e. A collection of reads and updates to a set of          object...
Replication          A number of distributed systems involve replication  Zubair Nabi               3: Large-scale Distrib...
Replication          A number of distributed systems involve replication                Data replication: Multiple copies ...
Replication          A number of distributed systems involve replication                Data replication: Multiple copies ...
Replication          A number of distributed systems involve replication                    Data replication: Multiple cop...
Replication          A number of distributed systems involve replication                    Data replication: Multiple cop...
Replication          A number of distributed systems involve replication                    Data replication: Multiple cop...
Replication          A number of distributed systems involve replication                    Data replication: Multiple cop...
CAP         CAP:               1   Consistency: All nodes see the same state Zubair Nabi                    3: Large-scale...
CAP         CAP:               1   Consistency: All nodes see the same state               2   Availability: All requests ...
CAP         CAP:               1 Consistency: All nodes see the same state               2 Availability: All requests get ...
CAP         CAP:               1 Consistency: All nodes see the same state               2 Availability: All requests get ...
CAP         CAP:               1 Consistency: All nodes see the same state               2 Availability: All requests get ...
CAP         CAP:               1 Consistency: All nodes see the same state               2 Availability: All requests get ...
References          George Coulouris, Jean Dollimore, Tim Kindberg, and Gordon Blair.          2011. Distributed Systems: ...
Upcoming SlideShare
Loading in …5
×

Topic 3: Large-scale Distributed Systems

1,034 views

Published on

Cloud Computing Workshop 2013, ITU

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
1,034
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
45
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Topic 3: Large-scale Distributed Systems

  1. 1. 3: Large-scale Distributed Systems Zubair Nabi zubair.nabi@itu.edu.pk April 17, 2013Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 1 / 29
  2. 2. Outline 1 Introduction 2 Client-server Interaction 3 Characteristics 4 Message Passing Interface Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 2 / 29
  3. 3. Outline 1 Introduction 2 Client-server Interaction 3 Characteristics 4 Message Passing Interface Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 3 / 29
  4. 4. Distributed Systems Set of discrete machines which cooperate to perform computation Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 4 / 29
  5. 5. Distributed Systems Set of discrete machines which cooperate to perform computation Give the notion of a single “machine” Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 4 / 29
  6. 6. Distributed Systems Set of discrete machines which cooperate to perform computation Give the notion of a single “machine” Examples: Compute clusters Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 4 / 29
  7. 7. Distributed Systems Set of discrete machines which cooperate to perform computation Give the notion of a single “machine” Examples: Compute clusters Distributed storage systems, such as Dropbox, Google Drive, etc. Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 4 / 29
  8. 8. Distributed Systems Set of discrete machines which cooperate to perform computation Give the notion of a single “machine” Examples: Compute clusters Distributed storage systems, such as Dropbox, Google Drive, etc. The Web Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 4 / 29
  9. 9. Advantages Scalability: The scale of the Internet (think how many queries Google servers handle daily) Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 5 / 29
  10. 10. Advantages Scalability: The scale of the Internet (think how many queries Google servers handle daily) Only a matter of adding more machines Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 5 / 29
  11. 11. Advantages Scalability: The scale of the Internet (think how many queries Google servers handle daily) Only a matter of adding more machines Cheaper than super computers Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 5 / 29
  12. 12. Advantages Scalability: The scale of the Internet (think how many queries Google servers handle daily) Only a matter of adding more machines Cheaper than super computers More machines means more parallelism, hence better performance Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 5 / 29
  13. 13. Advantages Scalability: The scale of the Internet (think how many queries Google servers handle daily) Only a matter of adding more machines Cheaper than super computers More machines means more parallelism, hence better performance Sharing: The same resource is shared between multiple users Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 5 / 29
  14. 14. Advantages Scalability: The scale of the Internet (think how many queries Google servers handle daily) Only a matter of adding more machines Cheaper than super computers More machines means more parallelism, hence better performance Sharing: The same resource is shared between multiple users Just like the Internet is shared between millions of users Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 5 / 29
  15. 15. Advantages Scalability: The scale of the Internet (think how many queries Google servers handle daily) Only a matter of adding more machines Cheaper than super computers More machines means more parallelism, hence better performance Sharing: The same resource is shared between multiple users Just like the Internet is shared between millions of users Communication: Communication between (potentially geographically isolated) machines and users (via email, Facebook, etc.) Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 5 / 29
  16. 16. Advantages Scalability: The scale of the Internet (think how many queries Google servers handle daily) Only a matter of adding more machines Cheaper than super computers More machines means more parallelism, hence better performance Sharing: The same resource is shared between multiple users Just like the Internet is shared between millions of users Communication: Communication between (potentially geographically isolated) machines and users (via email, Facebook, etc.) Reliability: The service can remain active even if multiple machines go down Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 5 / 29
  17. 17. Challenges Concurrency: Concurrent execution requires some form of coordination Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 6 / 29
  18. 18. Challenges Concurrency: Concurrent execution requires some form of coordination Fault-tolerance: Any component can fail at any instant due to a software or a hardware bug Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 6 / 29
  19. 19. Challenges Concurrency: Concurrent execution requires some form of coordination Fault-tolerance: Any component can fail at any instant due to a software or a hardware bug Security: One machine can compromise the entire system Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 6 / 29
  20. 20. Challenges Concurrency: Concurrent execution requires some form of coordination Fault-tolerance: Any component can fail at any instant due to a software or a hardware bug Security: One machine can compromise the entire system Coordination: No global time so non-trivial to coordinate Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 6 / 29
  21. 21. Challenges Concurrency: Concurrent execution requires some form of coordination Fault-tolerance: Any component can fail at any instant due to a software or a hardware bug Security: One machine can compromise the entire system Coordination: No global time so non-trivial to coordinate Trouble shooting: Hard to trouble shoot because hard to reason about the system Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 6 / 29
  22. 22. Transparency Distributed systems give the notion of a single machine or keep the distribution transparent Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 7 / 29
  23. 23. Transparency Distributed systems give the notion of a single machine or keep the distribution transparent The degree of this transparency can be mapped onto an entire spectrum of options for both users and programmers Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 7 / 29
  24. 24. Transparency Distributed systems give the notion of a single machine or keep the distribution transparent The degree of this transparency can be mapped onto an entire spectrum of options for both users and programmers For instance: A web user is aware of network communication but the number of accessed machines is transparent Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 7 / 29
  25. 25. Transparency Distributed systems give the notion of a single machine or keep the distribution transparent The degree of this transparency can be mapped onto an entire spectrum of options for both users and programmers For instance: A web user is aware of network communication but the number of accessed machines is transparent Transparency can be ensured by middleware that adds a layer of abstraction Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 7 / 29
  26. 26. Transparency Distributed systems give the notion of a single machine or keep the distribution transparent The degree of this transparency can be mapped onto an entire spectrum of options for both users and programmers For instance: A web user is aware of network communication but the number of accessed machines is transparent Transparency can be ensured by middleware that adds a layer of abstraction Can span access, concurrency, failure, location, migration, persistence, relocation, replication Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 7 / 29
  27. 27. Outline 1 Introduction 2 Client-server Interaction 3 Characteristics 4 Message Passing Interface Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 8 / 29
  28. 28. Request-reply protocol Standard operation 1 Client sends request to the server Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 9 / 29
  29. 29. Request-reply protocol Standard operation 1 Client sends request to the server 2 Server processes the request and sends a corresponding response Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 9 / 29
  30. 30. Request-reply protocol Standard operation 1 Client sends request to the server 2 Server processes the request and sends a corresponding response In the synchronous model, the client blocks till the response is received Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 9 / 29
  31. 31. Request-reply protocol Standard operation 1 Client sends request to the server 2 Server processes the request and sends a corresponding response In the synchronous model, the client blocks till the response is received In case of the asynchronous model, the client continues its execution Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 9 / 29
  32. 32. Request-reply protocol Standard operation 1 Client sends request to the server 2 Server processes the request and sends a corresponding response In the synchronous model, the client blocks till the response is received In case of the asynchronous model, the client continues its execution For instance: HTTP 1.0 Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 9 / 29
  33. 33. Request-reply protocol Standard operation 1 Client sends request to the server 2 Server processes the request and sends a corresponding response In the synchronous model, the client blocks till the response is received In case of the asynchronous model, the client continues its execution For instance: HTTP 1.0 1 Client sends GET /index.html Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 9 / 29
  34. 34. Request-reply protocol Standard operation 1 Client sends request to the server 2 Server processes the request and sends a corresponding response In the synchronous model, the client blocks till the response is received In case of the asynchronous model, the client continues its execution For instance: HTTP 1.0 1 Client sends GET /index.html 2 Server responds with index.html Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 9 / 29
  35. 35. Request-reply protocol Standard operation 1 Client sends request to the server 2 Server processes the request and sends a corresponding response In the synchronous model, the client blocks till the response is received In case of the asynchronous model, the client continues its execution For instance: HTTP 1.0 1 Client sends GET /index.html 2 Server responds with index.html 3 Client renders index.html Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 9 / 29
  36. 36. Errors and failures Errors are handled at the application-level Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 10 / 29
  37. 37. Errors and failures Errors are handled at the application-level For instance, if the client requests a non-existent web page just return a special reply: 404 Not Found Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 10 / 29
  38. 38. Errors and failures Errors are handled at the application-level For instance, if the client requests a non-existent web page just return a special reply: 404 Not Found Failures are system-level things Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 10 / 29
  39. 39. Errors and failures Errors are handled at the application-level For instance, if the client requests a non-existent web page just return a special reply: 404 Not Found Failures are system-level things For instance, lost message, client/server crash, etc. Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 10 / 29
  40. 40. Errors and failures Errors are handled at the application-level For instance, if the client requests a non-existent web page just return a special reply: 404 Not Found Failures are system-level things For instance, lost message, client/server crash, etc. To handle failure, the client must timeout after T Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 10 / 29
  41. 41. Errors and failures Errors are handled at the application-level For instance, if the client requests a non-existent web page just return a special reply: 404 Not Found Failures are system-level things For instance, lost message, client/server crash, etc. To handle failure, the client must timeout after T The client can retry on a timeout Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 10 / 29
  42. 42. Errors and failures Errors are handled at the application-level For instance, if the client requests a non-existent web page just return a special reply: 404 Not Found Failures are system-level things For instance, lost message, client/server crash, etc. To handle failure, the client must timeout after T The client can retry on a timeout Setting value of T is system-specific Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 10 / 29
  43. 43. Remote Procedure Call Request/response protocols are widely used but too low level Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 11 / 29
  44. 44. Remote Procedure Call Request/response protocols are widely used but too low level Need to define each request separately including their network message representation Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 11 / 29
  45. 45. Remote Procedure Call Request/response protocols are widely used but too low level Need to define each request separately including their network message representation Remote procedure call (RPC) presents a simpler abstraction Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 11 / 29
  46. 46. Remote Procedure Call Request/response protocols are widely used but too low level Need to define each request separately including their network message representation Remote procedure call (RPC) presents a simpler abstraction Programmer invokes a procedure which executes on a remote machine (the server) Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 11 / 29
  47. 47. Remote Procedure Call Request/response protocols are widely used but too low level Need to define each request separately including their network message representation Remote procedure call (RPC) presents a simpler abstraction Programmer invokes a procedure which executes on a remote machine (the server) RPC subsystem takes care of message formats, communication, timeouts, etc. Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 11 / 29
  48. 48. Remote Procedure Call Request/response protocols are widely used but too low level Need to define each request separately including their network message representation Remote procedure call (RPC) presents a simpler abstraction Programmer invokes a procedure which executes on a remote machine (the server) RPC subsystem takes care of message formats, communication, timeouts, etc. Distribution of the system becomes transparent Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 11 / 29
  49. 49. Remote Procedure Call Request/response protocols are widely used but too low level Need to define each request separately including their network message representation Remote procedure call (RPC) presents a simpler abstraction Programmer invokes a procedure which executes on a remote machine (the server) RPC subsystem takes care of message formats, communication, timeouts, etc. Distribution of the system becomes transparent Integrated with the programming language Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 11 / 29
  50. 50. Remote Procedure Call Request/response protocols are widely used but too low level Need to define each request separately including their network message representation Remote procedure call (RPC) presents a simpler abstraction Programmer invokes a procedure which executes on a remote machine (the server) RPC subsystem takes care of message formats, communication, timeouts, etc. Distribution of the system becomes transparent Integrated with the programming language RPC layer adds stubs at client end which when invoked execute a method at the server Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 11 / 29
  51. 51. Example: XML-RPC XML is used to encode method invocations (method names, parameters, etc.) Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 12 / 29
  52. 52. Example: XML-RPC XML is used to encode method invocations (method names, parameters, etc.) HTTP POST used to send request and receive response (also encoded in XML) Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 12 / 29
  53. 53. Example: XML-RPC XML is used to encode method invocations (method names, parameters, etc.) HTTP POST used to send request and receive response (also encoded in XML) Looks like a regular web session on wire so plays well with middleboxes Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 12 / 29
  54. 54. Example: XML-RPC XML is used to encode method invocations (method names, parameters, etc.) HTTP POST used to send request and receive response (also encoded in XML) Looks like a regular web session on wire so plays well with middleboxes Language agnostic and extensible Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 12 / 29
  55. 55. Example: XML-RPC XML is used to encode method invocations (method names, parameters, etc.) HTTP POST used to send request and receive response (also encoded in XML) Looks like a regular web session on wire so plays well with middleboxes Language agnostic and extensible Extended with more features (namespaces, user-defined types, etc.) and diverse transports (TCP, UDP, etc.) to result in Simple Object Access Protocol (SOAP) Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 12 / 29
  56. 56. RPC shortcomings RPC mechanisms are synchronous Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 13 / 29
  57. 57. RPC shortcomings RPC mechanisms are synchronous Client blocks till response is received Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 13 / 29
  58. 58. RPC shortcomings RPC mechanisms are synchronous Client blocks till response is received Poor responsiveness, especially in high latency networks Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 13 / 29
  59. 59. RPC shortcomings RPC mechanisms are synchronous Client blocks till response is received Poor responsiveness, especially in high latency networks 2006 ushered in the age of Asynchronous JavaScript with XML (AJAX) Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 13 / 29
  60. 60. RPC shortcomings RPC mechanisms are synchronous Client blocks till response is received Poor responsiveness, especially in high latency networks 2006 ushered in the age of Asynchronous JavaScript with XML (AJAX) Update web page without reloading Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 13 / 29
  61. 61. RPC shortcomings RPC mechanisms are synchronous Client blocks till response is received Poor responsiveness, especially in high latency networks 2006 ushered in the age of Asynchronous JavaScript with XML (AJAX) Update web page without reloading For instance, Google Maps, Gmail, etc. Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 13 / 29
  62. 62. Representational State Transfer AJAX still revolves around RPC (just asynchronously) Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 14 / 29
  63. 63. Representational State Transfer AJAX still revolves around RPC (just asynchronously) Representational State Transfer (REST) offers an alternative All resources have a name: URL or URI Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 14 / 29
  64. 64. Representational State Transfer AJAX still revolves around RPC (just asynchronously) Representational State Transfer (REST) offers an alternative All resources have a name: URL or URI Resources are manipulated with PUT, GET, POST, and DELETE methods Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 14 / 29
  65. 65. Representational State Transfer AJAX still revolves around RPC (just asynchronously) Representational State Transfer (REST) offers an alternative All resources have a name: URL or URI Resources are manipulated with PUT, GET, POST, and DELETE methods State is sent along with operations Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 14 / 29
  66. 66. Representational State Transfer AJAX still revolves around RPC (just asynchronously) Representational State Transfer (REST) offers an alternative All resources have a name: URL or URI Resources are manipulated with PUT, GET, POST, and DELETE methods State is sent along with operations Widely used these days (For instance, by Amazon, Twitter, etc.) Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 14 / 29
  67. 67. Outline 1 Introduction 2 Client-server Interaction 3 Characteristics 4 Message Passing Interface Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 15 / 29
  68. 68. Clocks Distributed systems need to be able to: Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 16 / 29
  69. 69. Clocks Distributed systems need to be able to: Order events produced by concurrent processes Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 16 / 29
  70. 70. Clocks Distributed systems need to be able to: Order events produced by concurrent processes Synchronize senders and receivers of messages Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 16 / 29
  71. 71. Clocks Distributed systems need to be able to: Order events produced by concurrent processes Synchronize senders and receivers of messages Serialize concurrent accesses to shared objects Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 16 / 29
  72. 72. Clocks Distributed systems need to be able to: Order events produced by concurrent processes Synchronize senders and receivers of messages Serialize concurrent accesses to shared objects Coordinate joint activity Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 16 / 29
  73. 73. Clocks Distributed systems need to be able to: Order events produced by concurrent processes Synchronize senders and receivers of messages Serialize concurrent accesses to shared objects Coordinate joint activity Clocks are employed for this Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 16 / 29
  74. 74. Clocks Distributed systems need to be able to: Order events produced by concurrent processes Synchronize senders and receivers of messages Serialize concurrent accesses to shared objects Coordinate joint activity Clocks are employed for this But quartz oscillators oscillate at slightly different frequencies leading to clock drift and resulting in clock skew between clocks Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 16 / 29
  75. 75. Clock synchronization Clock synchronization algorithms try to minimize skew between a set of clocks Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 17 / 29
  76. 76. Clock synchronization Clock synchronization algorithms try to minimize skew between a set of clocks Decide upon a correct time Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 17 / 29
  77. 77. Clock synchronization Clock synchronization algorithms try to minimize skew between a set of clocks Decide upon a correct time Communicate to agree (compensating for delays) Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 17 / 29
  78. 78. Clock synchronization Clock synchronization algorithms try to minimize skew between a set of clocks Decide upon a correct time Communicate to agree (compensating for delays) Possibly multiple servers involved Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 17 / 29
  79. 79. Clock synchronization Clock synchronization algorithms try to minimize skew between a set of clocks Decide upon a correct time Communicate to agree (compensating for delays) Possibly multiple servers involved In reality, still a 1-10ms skew after sync (but we can live with that) Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 17 / 29
  80. 80. Ordering Time is used to ensure ordering Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 18 / 29
  81. 81. Ordering Time is used to ensure ordering Withdraw money at 23:59.45 Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 18 / 29
  82. 82. Ordering Time is used to ensure ordering Withdraw money at 23:59.45 Bank calculates interest at 00:00.0 Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 18 / 29
  83. 83. Ordering Time is used to ensure ordering Withdraw money at 23:59.45 Bank calculates interest at 00:00.0 The withdraw money should not be included in the interest calculation Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 18 / 29
  84. 84. Ordering Time is used to ensure ordering Withdraw money at 23:59.45 Bank calculates interest at 00:00.0 The withdraw money should not be included in the interest calculation In most cases, only need to know that a happened before b, known as the happens-before relation Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 18 / 29
  85. 85. Ordering Time is used to ensure ordering Withdraw money at 23:59.45 Bank calculates interest at 00:00.0 The withdraw money should not be included in the interest calculation In most cases, only need to know that a happened before b, known as the happens-before relation Multiple algorithms exists to ensure the happens-before relation Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 18 / 29
  86. 86. Distributed Mutual Exclusion Concurrent access to shared resources needs to be synchronized Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 19 / 29
  87. 87. Distributed Mutual Exclusion Concurrent access to shared resources needs to be synchronized Need hardware support on local machine Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 19 / 29
  88. 88. Distributed Mutual Exclusion Concurrent access to shared resources needs to be synchronized Need hardware support on local machine Locks, semaphores, etc. Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 19 / 29
  89. 89. Distributed Mutual Exclusion Concurrent access to shared resources needs to be synchronized Need hardware support on local machine Locks, semaphores, etc. But this support is not available across a distributed system Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 19 / 29
  90. 90. Distributed Mutual Exclusion (2) Multiple methods exist to ensure this: Central lock server: All lock requests are handled by a central server Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 20 / 29
  91. 91. Distributed Mutual Exclusion (2) Multiple methods exist to ensure this: Central lock server: All lock requests are handled by a central server Token passing: Arrange nodes into a ring and a token is passed around Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 20 / 29
  92. 92. Distributed Mutual Exclusion (2) Multiple methods exist to ensure this: Central lock server: All lock requests are handled by a central server Token passing: Arrange nodes into a ring and a token is passed around Totally-ordered multicast: Clients multicast requests to each other Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 20 / 29
  93. 93. Consensus Getting processes in a distributed system to agree on something Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 21 / 29
  94. 94. Consensus Getting processes in a distributed system to agree on something Requirements for correct solution Agreement: All nodes arrive at the same answer Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 21 / 29
  95. 95. Consensus Getting processes in a distributed system to agree on something Requirements for correct solution Agreement: All nodes arrive at the same answer Validity: Answer is one that was proposed by someone Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 21 / 29
  96. 96. Consensus Getting processes in a distributed system to agree on something Requirements for correct solution Agreement: All nodes arrive at the same answer Validity: Answer is one that was proposed by someone Termination: All nodes eventually decide Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 21 / 29
  97. 97. Distributed transactions Composite operations (i.e. A collection of reads and updates to a set of objects) Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 22 / 29
  98. 98. Distributed transactions Composite operations (i.e. A collection of reads and updates to a set of objects) A transaction is atomic Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 22 / 29
  99. 99. Distributed transactions Composite operations (i.e. A collection of reads and updates to a set of objects) A transaction is atomic If it commits, all operations are applied Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 22 / 29
  100. 100. Distributed transactions Composite operations (i.e. A collection of reads and updates to a set of objects) A transaction is atomic If it commits, all operations are applied If it aborts, no state mutation at all Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 22 / 29
  101. 101. Distributed transactions Composite operations (i.e. A collection of reads and updates to a set of objects) A transaction is atomic If it commits, all operations are applied If it aborts, no state mutation at all Distributed transactions span multiple transaction processing servers Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 22 / 29
  102. 102. Distributed transactions Composite operations (i.e. A collection of reads and updates to a set of objects) A transaction is atomic If it commits, all operations are applied If it aborts, no state mutation at all Distributed transactions span multiple transaction processing servers For instance, booking flights: Lahore -> Dubai -> New York Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 22 / 29
  103. 103. Distributed transactions Composite operations (i.e. A collection of reads and updates to a set of objects) A transaction is atomic If it commits, all operations are applied If it aborts, no state mutation at all Distributed transactions span multiple transaction processing servers For instance, booking flights: Lahore -> Dubai -> New York Need to book entire trip Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 22 / 29
  104. 104. Distributed transactions Composite operations (i.e. A collection of reads and updates to a set of objects) A transaction is atomic If it commits, all operations are applied If it aborts, no state mutation at all Distributed transactions span multiple transaction processing servers For instance, booking flights: Lahore -> Dubai -> New York Need to book entire trip Actions need to be coordinated across multiple parties Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 22 / 29
  105. 105. Replication A number of distributed systems involve replication Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 23 / 29
  106. 106. Replication A number of distributed systems involve replication Data replication: Multiple copies of some object stored at different servers Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 23 / 29
  107. 107. Replication A number of distributed systems involve replication Data replication: Multiple copies of some object stored at different servers Computation replication: Multiple servers capable of providing an operation Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 23 / 29
  108. 108. Replication A number of distributed systems involve replication Data replication: Multiple copies of some object stored at different servers Computation replication: Multiple servers capable of providing an operation Advantages: 1 Load balancing: Work spread out across clients Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 23 / 29
  109. 109. Replication A number of distributed systems involve replication Data replication: Multiple copies of some object stored at different servers Computation replication: Multiple servers capable of providing an operation Advantages: 1 Load balancing: Work spread out across clients 2 Lower latency: Better performance if replica close to the client Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 23 / 29
  110. 110. Replication A number of distributed systems involve replication Data replication: Multiple copies of some object stored at different servers Computation replication: Multiple servers capable of providing an operation Advantages: 1 Load balancing: Work spread out across clients 2 Lower latency: Better performance if replica close to the client 3 Fault tolerance: Failure of some replicas can be tolerated Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 23 / 29
  111. 111. Replication A number of distributed systems involve replication Data replication: Multiple copies of some object stored at different servers Computation replication: Multiple servers capable of providing an operation Advantages: 1 Load balancing: Work spread out across clients 2 Lower latency: Better performance if replica close to the client 3 Fault tolerance: Failure of some replicas can be tolerated Examples: DNS, content distribution networks, database replication, etc. Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 23 / 29
  112. 112. CAP CAP: 1 Consistency: All nodes see the same state Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 24 / 29
  113. 113. CAP CAP: 1 Consistency: All nodes see the same state 2 Availability: All requests get a response Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 24 / 29
  114. 114. CAP CAP: 1 Consistency: All nodes see the same state 2 Availability: All requests get a response 3 Partitioning: System continues to operate even in the face of node failure Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 24 / 29
  115. 115. CAP CAP: 1 Consistency: All nodes see the same state 2 Availability: All requests get a response 3 Partitioning: System continues to operate even in the face of node failure Brewer’s conjecture states that in a distributed system only 2 out of 3 possible Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 24 / 29
  116. 116. CAP CAP: 1 Consistency: All nodes see the same state 2 Availability: All requests get a response 3 Partitioning: System continues to operate even in the face of node failure Brewer’s conjecture states that in a distributed system only 2 out of 3 possible In the current setup, partitioning is a given: Hardware/software fails all the time Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 24 / 29
  117. 117. CAP CAP: 1 Consistency: All nodes see the same state 2 Availability: All requests get a response 3 Partitioning: System continues to operate even in the face of node failure Brewer’s conjecture states that in a distributed system only 2 out of 3 possible In the current setup, partitioning is a given: Hardware/software fails all the time Therefore, systems need to choose between consistency and availability Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 24 / 29
  118. 118. References George Coulouris, Jean Dollimore, Tim Kindberg, and Gordon Blair. 2011. Distributed Systems: Concepts and Design (5th ed.). Addison-Wesley Publishing Company, USA. Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 25 / 29

×