Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Topic 3: Large-scale Distributed Systems
1. 3: Large-scale Distributed Systems
Zubair Nabi
zubair.nabi@itu.edu.pk
April 17, 2013
Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 1 / 29
2. Outline
1 Introduction
2 Client-server Interaction
3 Characteristics
4 Message Passing Interface
Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 2 / 29
3. Outline
1 Introduction
2 Client-server Interaction
3 Characteristics
4 Message Passing Interface
Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 3 / 29
4. Distributed Systems
Set of discrete machines which cooperate to perform computation
Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 4 / 29
5. Distributed Systems
Set of discrete machines which cooperate to perform computation
Give the notion of a single “machine”
Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 4 / 29
6. Distributed Systems
Set of discrete machines which cooperate to perform computation
Give the notion of a single “machine”
Examples:
Compute clusters
Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 4 / 29
7. Distributed Systems
Set of discrete machines which cooperate to perform computation
Give the notion of a single “machine”
Examples:
Compute clusters
Distributed storage systems, such as Dropbox, Google Drive, etc.
Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 4 / 29
8. Distributed Systems
Set of discrete machines which cooperate to perform computation
Give the notion of a single “machine”
Examples:
Compute clusters
Distributed storage systems, such as Dropbox, Google Drive, etc.
The Web
Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 4 / 29
9. Advantages
Scalability:
The scale of the Internet (think how many queries Google servers
handle daily)
Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 5 / 29
10. Advantages
Scalability:
The scale of the Internet (think how many queries Google servers
handle daily)
Only a matter of adding more machines
Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 5 / 29
11. Advantages
Scalability:
The scale of the Internet (think how many queries Google servers
handle daily)
Only a matter of adding more machines
Cheaper than super computers
Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 5 / 29
12. Advantages
Scalability:
The scale of the Internet (think how many queries Google servers
handle daily)
Only a matter of adding more machines
Cheaper than super computers
More machines means more parallelism, hence better performance
Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 5 / 29
13. Advantages
Scalability:
The scale of the Internet (think how many queries Google servers
handle daily)
Only a matter of adding more machines
Cheaper than super computers
More machines means more parallelism, hence better performance
Sharing:
The same resource is shared between multiple users
Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 5 / 29
14. Advantages
Scalability:
The scale of the Internet (think how many queries Google servers
handle daily)
Only a matter of adding more machines
Cheaper than super computers
More machines means more parallelism, hence better performance
Sharing:
The same resource is shared between multiple users
Just like the Internet is shared between millions of users
Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 5 / 29
15. Advantages
Scalability:
The scale of the Internet (think how many queries Google servers
handle daily)
Only a matter of adding more machines
Cheaper than super computers
More machines means more parallelism, hence better performance
Sharing:
The same resource is shared between multiple users
Just like the Internet is shared between millions of users
Communication:
Communication between (potentially geographically isolated) machines
and users (via email, Facebook, etc.)
Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 5 / 29
16. Advantages
Scalability:
The scale of the Internet (think how many queries Google servers
handle daily)
Only a matter of adding more machines
Cheaper than super computers
More machines means more parallelism, hence better performance
Sharing:
The same resource is shared between multiple users
Just like the Internet is shared between millions of users
Communication:
Communication between (potentially geographically isolated) machines
and users (via email, Facebook, etc.)
Reliability:
The service can remain active even if multiple machines go down
Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 5 / 29
17. Challenges
Concurrency:
Concurrent execution requires some form of coordination
Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 6 / 29
18. Challenges
Concurrency:
Concurrent execution requires some form of coordination
Fault-tolerance:
Any component can fail at any instant due to a software or a hardware
bug
Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 6 / 29
19. Challenges
Concurrency:
Concurrent execution requires some form of coordination
Fault-tolerance:
Any component can fail at any instant due to a software or a hardware
bug
Security:
One machine can compromise the entire system
Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 6 / 29
20. Challenges
Concurrency:
Concurrent execution requires some form of coordination
Fault-tolerance:
Any component can fail at any instant due to a software or a hardware
bug
Security:
One machine can compromise the entire system
Coordination:
No global time so non-trivial to coordinate
Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 6 / 29
21. Challenges
Concurrency:
Concurrent execution requires some form of coordination
Fault-tolerance:
Any component can fail at any instant due to a software or a hardware
bug
Security:
One machine can compromise the entire system
Coordination:
No global time so non-trivial to coordinate
Trouble shooting:
Hard to trouble shoot because hard to reason about the system
Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 6 / 29
22. Transparency
Distributed systems give the notion of a single machine or keep the
distribution transparent
Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 7 / 29
23. Transparency
Distributed systems give the notion of a single machine or keep the
distribution transparent
The degree of this transparency can be mapped onto an entire
spectrum of options for both users and programmers
Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 7 / 29
24. Transparency
Distributed systems give the notion of a single machine or keep the
distribution transparent
The degree of this transparency can be mapped onto an entire
spectrum of options for both users and programmers
For instance:
A web user is aware of network communication but the number of
accessed machines is transparent
Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 7 / 29
25. Transparency
Distributed systems give the notion of a single machine or keep the
distribution transparent
The degree of this transparency can be mapped onto an entire
spectrum of options for both users and programmers
For instance:
A web user is aware of network communication but the number of
accessed machines is transparent
Transparency can be ensured by middleware that adds a layer of
abstraction
Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 7 / 29
26. Transparency
Distributed systems give the notion of a single machine or keep the
distribution transparent
The degree of this transparency can be mapped onto an entire
spectrum of options for both users and programmers
For instance:
A web user is aware of network communication but the number of
accessed machines is transparent
Transparency can be ensured by middleware that adds a layer of
abstraction
Can span access, concurrency, failure, location, migration,
persistence, relocation, replication
Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 7 / 29
27. Outline
1 Introduction
2 Client-server Interaction
3 Characteristics
4 Message Passing Interface
Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 8 / 29
28. Request-reply protocol
Standard operation
1 Client sends request to the server
Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 9 / 29
29. Request-reply protocol
Standard operation
1 Client sends request to the server
2 Server processes the request and sends a corresponding response
Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 9 / 29
30. Request-reply protocol
Standard operation
1 Client sends request to the server
2 Server processes the request and sends a corresponding response
In the synchronous model, the client blocks till the response is received
Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 9 / 29
31. Request-reply protocol
Standard operation
1 Client sends request to the server
2 Server processes the request and sends a corresponding response
In the synchronous model, the client blocks till the response is received
In case of the asynchronous model, the client continues its execution
Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 9 / 29
32. Request-reply protocol
Standard operation
1 Client sends request to the server
2 Server processes the request and sends a corresponding response
In the synchronous model, the client blocks till the response is received
In case of the asynchronous model, the client continues its execution
For instance: HTTP 1.0
Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 9 / 29
33. Request-reply protocol
Standard operation
1 Client sends request to the server
2 Server processes the request and sends a corresponding response
In the synchronous model, the client blocks till the response is received
In case of the asynchronous model, the client continues its execution
For instance: HTTP 1.0
1 Client sends GET /index.html
Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 9 / 29
34. Request-reply protocol
Standard operation
1 Client sends request to the server
2 Server processes the request and sends a corresponding response
In the synchronous model, the client blocks till the response is received
In case of the asynchronous model, the client continues its execution
For instance: HTTP 1.0
1 Client sends GET /index.html
2 Server responds with index.html
Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 9 / 29
35. Request-reply protocol
Standard operation
1 Client sends request to the server
2 Server processes the request and sends a corresponding response
In the synchronous model, the client blocks till the response is received
In case of the asynchronous model, the client continues its execution
For instance: HTTP 1.0
1 Client sends GET /index.html
2 Server responds with index.html
3 Client renders index.html
Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 9 / 29
36. Errors and failures
Errors are handled at the application-level
Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 10 / 29
37. Errors and failures
Errors are handled at the application-level
For instance, if the client requests a non-existent web page just return a
special reply: 404 Not Found
Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 10 / 29
38. Errors and failures
Errors are handled at the application-level
For instance, if the client requests a non-existent web page just return a
special reply: 404 Not Found
Failures are system-level things
Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 10 / 29
39. Errors and failures
Errors are handled at the application-level
For instance, if the client requests a non-existent web page just return a
special reply: 404 Not Found
Failures are system-level things
For instance, lost message, client/server crash, etc.
Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 10 / 29
40. Errors and failures
Errors are handled at the application-level
For instance, if the client requests a non-existent web page just return a
special reply: 404 Not Found
Failures are system-level things
For instance, lost message, client/server crash, etc.
To handle failure, the client must timeout after T
Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 10 / 29
41. Errors and failures
Errors are handled at the application-level
For instance, if the client requests a non-existent web page just return a
special reply: 404 Not Found
Failures are system-level things
For instance, lost message, client/server crash, etc.
To handle failure, the client must timeout after T
The client can retry on a timeout
Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 10 / 29
42. Errors and failures
Errors are handled at the application-level
For instance, if the client requests a non-existent web page just return a
special reply: 404 Not Found
Failures are system-level things
For instance, lost message, client/server crash, etc.
To handle failure, the client must timeout after T
The client can retry on a timeout
Setting value of T is system-specific
Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 10 / 29
43. Remote Procedure Call
Request/response protocols are widely used but too low level
Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 11 / 29
44. Remote Procedure Call
Request/response protocols are widely used but too low level
Need to define each request separately including their network message
representation
Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 11 / 29
45. Remote Procedure Call
Request/response protocols are widely used but too low level
Need to define each request separately including their network message
representation
Remote procedure call (RPC) presents a simpler abstraction
Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 11 / 29
46. Remote Procedure Call
Request/response protocols are widely used but too low level
Need to define each request separately including their network message
representation
Remote procedure call (RPC) presents a simpler abstraction
Programmer invokes a procedure which executes on a remote machine
(the server)
Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 11 / 29
47. Remote Procedure Call
Request/response protocols are widely used but too low level
Need to define each request separately including their network message
representation
Remote procedure call (RPC) presents a simpler abstraction
Programmer invokes a procedure which executes on a remote machine
(the server)
RPC subsystem takes care of message formats, communication,
timeouts, etc.
Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 11 / 29
48. Remote Procedure Call
Request/response protocols are widely used but too low level
Need to define each request separately including their network message
representation
Remote procedure call (RPC) presents a simpler abstraction
Programmer invokes a procedure which executes on a remote machine
(the server)
RPC subsystem takes care of message formats, communication,
timeouts, etc.
Distribution of the system becomes transparent
Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 11 / 29
49. Remote Procedure Call
Request/response protocols are widely used but too low level
Need to define each request separately including their network message
representation
Remote procedure call (RPC) presents a simpler abstraction
Programmer invokes a procedure which executes on a remote machine
(the server)
RPC subsystem takes care of message formats, communication,
timeouts, etc.
Distribution of the system becomes transparent
Integrated with the programming language
Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 11 / 29
50. Remote Procedure Call
Request/response protocols are widely used but too low level
Need to define each request separately including their network message
representation
Remote procedure call (RPC) presents a simpler abstraction
Programmer invokes a procedure which executes on a remote machine
(the server)
RPC subsystem takes care of message formats, communication,
timeouts, etc.
Distribution of the system becomes transparent
Integrated with the programming language
RPC layer adds stubs at client end which when invoked execute a
method at the server
Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 11 / 29
51. Example: XML-RPC
XML is used to encode method invocations (method names,
parameters, etc.)
Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 12 / 29
52. Example: XML-RPC
XML is used to encode method invocations (method names,
parameters, etc.)
HTTP POST used to send request and receive response (also
encoded in XML)
Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 12 / 29
53. Example: XML-RPC
XML is used to encode method invocations (method names,
parameters, etc.)
HTTP POST used to send request and receive response (also
encoded in XML)
Looks like a regular web session on wire so plays well with
middleboxes
Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 12 / 29
54. Example: XML-RPC
XML is used to encode method invocations (method names,
parameters, etc.)
HTTP POST used to send request and receive response (also
encoded in XML)
Looks like a regular web session on wire so plays well with
middleboxes
Language agnostic and extensible
Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 12 / 29
55. Example: XML-RPC
XML is used to encode method invocations (method names,
parameters, etc.)
HTTP POST used to send request and receive response (also
encoded in XML)
Looks like a regular web session on wire so plays well with
middleboxes
Language agnostic and extensible
Extended with more features (namespaces, user-defined types, etc.)
and diverse transports (TCP, UDP, etc.) to result in Simple Object
Access Protocol (SOAP)
Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 12 / 29
56. RPC shortcomings
RPC mechanisms are synchronous
Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 13 / 29
57. RPC shortcomings
RPC mechanisms are synchronous
Client blocks till response is received
Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 13 / 29
58. RPC shortcomings
RPC mechanisms are synchronous
Client blocks till response is received
Poor responsiveness, especially in high latency networks
Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 13 / 29
59. RPC shortcomings
RPC mechanisms are synchronous
Client blocks till response is received
Poor responsiveness, especially in high latency networks
2006 ushered in the age of Asynchronous JavaScript with XML (AJAX)
Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 13 / 29
60. RPC shortcomings
RPC mechanisms are synchronous
Client blocks till response is received
Poor responsiveness, especially in high latency networks
2006 ushered in the age of Asynchronous JavaScript with XML (AJAX)
Update web page without reloading
Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 13 / 29
61. RPC shortcomings
RPC mechanisms are synchronous
Client blocks till response is received
Poor responsiveness, especially in high latency networks
2006 ushered in the age of Asynchronous JavaScript with XML (AJAX)
Update web page without reloading
For instance, Google Maps, Gmail, etc.
Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 13 / 29
62. Representational State Transfer
AJAX still revolves around RPC (just asynchronously)
Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 14 / 29
63. Representational State Transfer
AJAX still revolves around RPC (just asynchronously)
Representational State Transfer (REST) offers an alternative
All resources have a name: URL or URI
Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 14 / 29
64. Representational State Transfer
AJAX still revolves around RPC (just asynchronously)
Representational State Transfer (REST) offers an alternative
All resources have a name: URL or URI
Resources are manipulated with PUT, GET, POST, and DELETE
methods
Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 14 / 29
65. Representational State Transfer
AJAX still revolves around RPC (just asynchronously)
Representational State Transfer (REST) offers an alternative
All resources have a name: URL or URI
Resources are manipulated with PUT, GET, POST, and DELETE
methods
State is sent along with operations
Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 14 / 29
66. Representational State Transfer
AJAX still revolves around RPC (just asynchronously)
Representational State Transfer (REST) offers an alternative
All resources have a name: URL or URI
Resources are manipulated with PUT, GET, POST, and DELETE
methods
State is sent along with operations
Widely used these days (For instance, by Amazon, Twitter, etc.)
Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 14 / 29
67. Outline
1 Introduction
2 Client-server Interaction
3 Characteristics
4 Message Passing Interface
Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 15 / 29
68. Clocks
Distributed systems need to be able to:
Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 16 / 29
69. Clocks
Distributed systems need to be able to:
Order events produced by concurrent processes
Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 16 / 29
70. Clocks
Distributed systems need to be able to:
Order events produced by concurrent processes
Synchronize senders and receivers of messages
Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 16 / 29
71. Clocks
Distributed systems need to be able to:
Order events produced by concurrent processes
Synchronize senders and receivers of messages
Serialize concurrent accesses to shared objects
Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 16 / 29
72. Clocks
Distributed systems need to be able to:
Order events produced by concurrent processes
Synchronize senders and receivers of messages
Serialize concurrent accesses to shared objects
Coordinate joint activity
Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 16 / 29
73. Clocks
Distributed systems need to be able to:
Order events produced by concurrent processes
Synchronize senders and receivers of messages
Serialize concurrent accesses to shared objects
Coordinate joint activity
Clocks are employed for this
Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 16 / 29
74. Clocks
Distributed systems need to be able to:
Order events produced by concurrent processes
Synchronize senders and receivers of messages
Serialize concurrent accesses to shared objects
Coordinate joint activity
Clocks are employed for this
But quartz oscillators oscillate at slightly different frequencies leading
to clock drift and resulting in clock skew between clocks
Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 16 / 29
75. Clock synchronization
Clock synchronization algorithms try to minimize skew between a set of
clocks
Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 17 / 29
76. Clock synchronization
Clock synchronization algorithms try to minimize skew between a set of
clocks
Decide upon a correct time
Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 17 / 29
77. Clock synchronization
Clock synchronization algorithms try to minimize skew between a set of
clocks
Decide upon a correct time
Communicate to agree (compensating for delays)
Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 17 / 29
78. Clock synchronization
Clock synchronization algorithms try to minimize skew between a set of
clocks
Decide upon a correct time
Communicate to agree (compensating for delays)
Possibly multiple servers involved
Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 17 / 29
79. Clock synchronization
Clock synchronization algorithms try to minimize skew between a set of
clocks
Decide upon a correct time
Communicate to agree (compensating for delays)
Possibly multiple servers involved
In reality, still a 1-10ms skew after sync (but we can live with that)
Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 17 / 29
80. Ordering
Time is used to ensure ordering
Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 18 / 29
81. Ordering
Time is used to ensure ordering
Withdraw money at 23:59.45
Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 18 / 29
82. Ordering
Time is used to ensure ordering
Withdraw money at 23:59.45
Bank calculates interest at 00:00.0
Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 18 / 29
83. Ordering
Time is used to ensure ordering
Withdraw money at 23:59.45
Bank calculates interest at 00:00.0
The withdraw money should not be included in the interest calculation
Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 18 / 29
84. Ordering
Time is used to ensure ordering
Withdraw money at 23:59.45
Bank calculates interest at 00:00.0
The withdraw money should not be included in the interest calculation
In most cases, only need to know that a happened before b, known as
the happens-before relation
Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 18 / 29
85. Ordering
Time is used to ensure ordering
Withdraw money at 23:59.45
Bank calculates interest at 00:00.0
The withdraw money should not be included in the interest calculation
In most cases, only need to know that a happened before b, known as
the happens-before relation
Multiple algorithms exists to ensure the happens-before relation
Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 18 / 29
86. Distributed Mutual Exclusion
Concurrent access to shared resources needs to be synchronized
Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 19 / 29
87. Distributed Mutual Exclusion
Concurrent access to shared resources needs to be synchronized
Need hardware support on local machine
Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 19 / 29
88. Distributed Mutual Exclusion
Concurrent access to shared resources needs to be synchronized
Need hardware support on local machine
Locks, semaphores, etc.
Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 19 / 29
89. Distributed Mutual Exclusion
Concurrent access to shared resources needs to be synchronized
Need hardware support on local machine
Locks, semaphores, etc.
But this support is not available across a distributed system
Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 19 / 29
90. Distributed Mutual Exclusion (2)
Multiple methods exist to ensure this:
Central lock server: All lock requests are handled by a central server
Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 20 / 29
91. Distributed Mutual Exclusion (2)
Multiple methods exist to ensure this:
Central lock server: All lock requests are handled by a central server
Token passing: Arrange nodes into a ring and a token is passed
around
Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 20 / 29
92. Distributed Mutual Exclusion (2)
Multiple methods exist to ensure this:
Central lock server: All lock requests are handled by a central server
Token passing: Arrange nodes into a ring and a token is passed
around
Totally-ordered multicast: Clients multicast requests to each other
Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 20 / 29
93. Consensus
Getting processes in a distributed system to agree on something
Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 21 / 29
94. Consensus
Getting processes in a distributed system to agree on something
Requirements for correct solution
Agreement: All nodes arrive at the same answer
Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 21 / 29
95. Consensus
Getting processes in a distributed system to agree on something
Requirements for correct solution
Agreement: All nodes arrive at the same answer
Validity: Answer is one that was proposed by someone
Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 21 / 29
96. Consensus
Getting processes in a distributed system to agree on something
Requirements for correct solution
Agreement: All nodes arrive at the same answer
Validity: Answer is one that was proposed by someone
Termination: All nodes eventually decide
Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 21 / 29
97. Distributed transactions
Composite operations (i.e. A collection of reads and updates to a set of
objects)
Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 22 / 29
98. Distributed transactions
Composite operations (i.e. A collection of reads and updates to a set of
objects)
A transaction is atomic
Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 22 / 29
99. Distributed transactions
Composite operations (i.e. A collection of reads and updates to a set of
objects)
A transaction is atomic
If it commits, all operations are applied
Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 22 / 29
100. Distributed transactions
Composite operations (i.e. A collection of reads and updates to a set of
objects)
A transaction is atomic
If it commits, all operations are applied
If it aborts, no state mutation at all
Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 22 / 29
101. Distributed transactions
Composite operations (i.e. A collection of reads and updates to a set of
objects)
A transaction is atomic
If it commits, all operations are applied
If it aborts, no state mutation at all
Distributed transactions span multiple transaction processing servers
Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 22 / 29
102. Distributed transactions
Composite operations (i.e. A collection of reads and updates to a set of
objects)
A transaction is atomic
If it commits, all operations are applied
If it aborts, no state mutation at all
Distributed transactions span multiple transaction processing servers
For instance, booking flights: Lahore -> Dubai -> New York
Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 22 / 29
103. Distributed transactions
Composite operations (i.e. A collection of reads and updates to a set of
objects)
A transaction is atomic
If it commits, all operations are applied
If it aborts, no state mutation at all
Distributed transactions span multiple transaction processing servers
For instance, booking flights: Lahore -> Dubai -> New York
Need to book entire trip
Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 22 / 29
104. Distributed transactions
Composite operations (i.e. A collection of reads and updates to a set of
objects)
A transaction is atomic
If it commits, all operations are applied
If it aborts, no state mutation at all
Distributed transactions span multiple transaction processing servers
For instance, booking flights: Lahore -> Dubai -> New York
Need to book entire trip
Actions need to be coordinated across multiple parties
Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 22 / 29
105. Replication
A number of distributed systems involve replication
Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 23 / 29
106. Replication
A number of distributed systems involve replication
Data replication: Multiple copies of some object stored at different
servers
Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 23 / 29
107. Replication
A number of distributed systems involve replication
Data replication: Multiple copies of some object stored at different
servers
Computation replication: Multiple servers capable of providing an
operation
Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 23 / 29
108. Replication
A number of distributed systems involve replication
Data replication: Multiple copies of some object stored at different
servers
Computation replication: Multiple servers capable of providing an
operation
Advantages:
1 Load balancing: Work spread out across clients
Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 23 / 29
109. Replication
A number of distributed systems involve replication
Data replication: Multiple copies of some object stored at different
servers
Computation replication: Multiple servers capable of providing an
operation
Advantages:
1 Load balancing: Work spread out across clients
2 Lower latency: Better performance if replica close to the client
Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 23 / 29
110. Replication
A number of distributed systems involve replication
Data replication: Multiple copies of some object stored at different
servers
Computation replication: Multiple servers capable of providing an
operation
Advantages:
1 Load balancing: Work spread out across clients
2 Lower latency: Better performance if replica close to the client
3 Fault tolerance: Failure of some replicas can be tolerated
Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 23 / 29
111. Replication
A number of distributed systems involve replication
Data replication: Multiple copies of some object stored at different
servers
Computation replication: Multiple servers capable of providing an
operation
Advantages:
1 Load balancing: Work spread out across clients
2 Lower latency: Better performance if replica close to the client
3 Fault tolerance: Failure of some replicas can be tolerated
Examples: DNS, content distribution networks, database replication,
etc.
Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 23 / 29
112. CAP
CAP:
1 Consistency: All nodes see the same state
Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 24 / 29
113. CAP
CAP:
1 Consistency: All nodes see the same state
2 Availability: All requests get a response
Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 24 / 29
114. CAP
CAP:
1 Consistency: All nodes see the same state
2 Availability: All requests get a response
3 Partitioning: System continues to operate even in the face of node
failure
Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 24 / 29
115. CAP
CAP:
1 Consistency: All nodes see the same state
2 Availability: All requests get a response
3 Partitioning: System continues to operate even in the face of node
failure
Brewer’s conjecture states that in a distributed system only 2 out of 3
possible
Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 24 / 29
116. CAP
CAP:
1 Consistency: All nodes see the same state
2 Availability: All requests get a response
3 Partitioning: System continues to operate even in the face of node
failure
Brewer’s conjecture states that in a distributed system only 2 out of 3
possible
In the current setup, partitioning is a given: Hardware/software fails all
the time
Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 24 / 29
117. CAP
CAP:
1 Consistency: All nodes see the same state
2 Availability: All requests get a response
3 Partitioning: System continues to operate even in the face of node
failure
Brewer’s conjecture states that in a distributed system only 2 out of 3
possible
In the current setup, partitioning is a given: Hardware/software fails all
the time
Therefore, systems need to choose between consistency and
availability
Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 24 / 29
118. References
George Coulouris, Jean Dollimore, Tim Kindberg, and Gordon Blair.
2011. Distributed Systems: Concepts and Design (5th ed.).
Addison-Wesley Publishing Company, USA.
Zubair Nabi 3: Large-scale Distributed Systems April 17, 2013 25 / 29