Lecture 3 – Networks and Distributed Systems CSE 490h – Introduction to Distributed Computing, Spring 2007 Except as other...
Outline <ul><li>Networking </li></ul><ul><li>Remote Procedure Calls (RPC) </li></ul><ul><li>Transaction Processing Systems...
Fundamentals of Networking
Sockets: The Internet = tubes? <ul><li>A socket is the basic network interface </li></ul><ul><li>Provides a two-way “pipe”...
Ports <ul><li>Within an IP address, a  port  is a sub-address identifying a listening program </li></ul><ul><li>Allows mul...
Example: Web Server (1/3) The server creates a  listener  socket attached to a specific port. 80 is the agreed-upon port n...
Example: Web Server (2/3) The client-side socket is still connected to a port, but the OS chooses a random unused port num...
Example: Web Server (3/3) Server chooses a randomly-numbered port to handle this particular client Listener is ready for m...
What makes this work? <ul><li>Underneath the socket layer are several more protocols </li></ul><ul><li>Most important are ...
IP: The Internet Protocol <ul><li>Defines the addressing scheme for computers  </li></ul><ul><li>Encapsulates internal dat...
TCP: Transmission Control Protocol <ul><li>Built on top of IP </li></ul><ul><li>Introduces concept of “connection” </li></...
Why is This Necessary? <ul><li>Not actually tube-like “underneath the hood” </li></ul><ul><li>Unlike phone system (circuit...
Networking Issues <ul><li>If a party to a socket disconnects, how much data did they receive? </li></ul><ul><li>…  Did the...
Remote Procedure Calls (RPC)
How RPC Doesn’t Work <ul><li>Regular client-server protocols involve sending data back and forth according to a shared sta...
Remote Procedure Call <ul><li>RPC servers will call arbitrary functions in dll, exe, with arguments passed over the networ...
Possible Interfaces  <ul><li>RPC can be used with two basic interfaces:  synchronous  and  asynchronous </li></ul><ul><li>...
Synchronous RPC
Asynchronous RPC
Asynchronous RPC 2: Callbacks
Wrapper Functions <ul><li>Writing rpc_call(foo.dll, bar, arg0, arg1..) is poor form </li></ul><ul><ul><li>Confusing code <...
More Design Considerations <ul><li>Who can call RPC functions? Anybody? </li></ul><ul><li>How do you handle multiple versi...
Transaction Processing Systems (We’re using the blue cover sheets on the TPS reports now…)
TPS: Definition <ul><li>A system that handles  transactions  coming from several sources concurrently </li></ul><ul><li>Tr...
Key Features of TPS: ACID <ul><li>“ ACID” is the acronym for the features a TPS must support: </li></ul><ul><li>Atomicity ...
Atomicity & Durability <ul><li>What happens if we write half of a transaction to disk and the power goes out? </li></ul>
Logging: The Undo Buffer <ul><li>Database writes to log the current values of all cells it is going to overwrite </li></ul...
Consistency: Data Types <ul><li>Data entered in databases have rigorous data types associated with them, and explicit rang...
Consistency: Foreign Keys <ul><li>Database designers declare that fields are indices into the keys of another table </li><...
Isolation <ul><li>Using  mutual-exclusion locks , we can prevent other processes from reading data we are in the process o...
Faulty Locking <ul><li>Locking alone does not ensure isolation! </li></ul><ul><li>Changes to table A are visible before ch...
Two-Phase Locking <ul><li>After a transaction has released any locks, it may not acquire any new locks </li></ul><ul><li>E...
Relationship to Distributed Comp <ul><li>At the heart of a TPS is usually a large database server </li></ul><ul><li>Severa...
Conclusions <ul><li>We’ve seen 3 layers that make up a distributed system </li></ul><ul><li>Designing a large distributed ...
Upcoming SlideShare
Loading in …5
×

Network and distributed systems

891 views

Published on

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
891
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
34
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • Object marshalling: do you eagerly or lazily send pointed-to objects? (Eager can cut down on latency, but lazy saves bandwidth… 1 GB lists are too much to send. Maybe send out to a certain horizon of depth of object pointed-ness?) Error conditions: type errors / fn not found / version mismatches / network connectivity issues + Do you stop running RPC hosts? + Keep running and save results to a designated file? + If the client disconnects at some point after RPC host is finished, do we roll back our state changes?
  • ASK: Can the lock of B can be pushed to after we write to table A? (Yes) Go over why this still maintains isolation
  • Network and distributed systems

    1. 1. Lecture 3 – Networks and Distributed Systems CSE 490h – Introduction to Distributed Computing, Spring 2007 Except as otherwise noted, the content of this presentation is licensed under the Creative Commons Attribution 2.5 License.
    2. 2. Outline <ul><li>Networking </li></ul><ul><li>Remote Procedure Calls (RPC) </li></ul><ul><li>Transaction Processing Systems </li></ul>
    3. 3. Fundamentals of Networking
    4. 4. Sockets: The Internet = tubes? <ul><li>A socket is the basic network interface </li></ul><ul><li>Provides a two-way “pipe” abstraction between two applications </li></ul><ul><li>Client creates a socket, and connects to the server, who receives a socket representing the other side </li></ul>
    5. 5. Ports <ul><li>Within an IP address, a port is a sub-address identifying a listening program </li></ul><ul><li>Allows multiple clients to connect to a server at once </li></ul>
    6. 6. Example: Web Server (1/3) The server creates a listener socket attached to a specific port. 80 is the agreed-upon port number for web traffic.
    7. 7. Example: Web Server (2/3) The client-side socket is still connected to a port, but the OS chooses a random unused port number When the client requests a URL (e.g., “www.google.com”), its OS uses a system called DNS to find its IP address.
    8. 8. Example: Web Server (3/3) Server chooses a randomly-numbered port to handle this particular client Listener is ready for more incoming connections, while we process the current connection in parallel
    9. 9. What makes this work? <ul><li>Underneath the socket layer are several more protocols </li></ul><ul><li>Most important are TCP and IP (which are used hand-in-hand so often, they’re often spoken of as one protocol: TCP/IP) </li></ul>Even more low-level protocols handle how data is sent over Ethernet wires, or how bits are sent through the air using 802.11 wireless…
    10. 10. IP: The Internet Protocol <ul><li>Defines the addressing scheme for computers </li></ul><ul><li>Encapsulates internal data in a “packet” </li></ul><ul><li>Does not provide reliability </li></ul><ul><li>Just includes enough information for the data to tell routers where to send it </li></ul>
    11. 11. TCP: Transmission Control Protocol <ul><li>Built on top of IP </li></ul><ul><li>Introduces concept of “connection” </li></ul><ul><li>Provides reliability and ordering </li></ul>
    12. 12. Why is This Necessary? <ul><li>Not actually tube-like “underneath the hood” </li></ul><ul><li>Unlike phone system (circuit switched), the packet switched Internet uses many routes at once </li></ul>
    13. 13. Networking Issues <ul><li>If a party to a socket disconnects, how much data did they receive? </li></ul><ul><li>… Did they crash? Or did a machine in the middle? </li></ul><ul><li>Can someone in the middle intercept/modify our data? </li></ul><ul><li>Traffic congestion makes switch/router topology important for efficient throughput </li></ul>
    14. 14. Remote Procedure Calls (RPC)
    15. 15. How RPC Doesn’t Work <ul><li>Regular client-server protocols involve sending data back and forth according to a shared state </li></ul>Client: Server: HTTP/1.0 index.html GET 200 OK Length: 2400 (file data) HTTP/1.0 hello.gif GET 200 OK Length: 81494 …
    16. 16. Remote Procedure Call <ul><li>RPC servers will call arbitrary functions in dll, exe, with arguments passed over the network, and return values back over network </li></ul>Client: Server: foo.dll,bar(4, 10, “hello”) “ returned_string” foo.dll,baz(42) err: no such function …
    17. 17. Possible Interfaces <ul><li>RPC can be used with two basic interfaces: synchronous and asynchronous </li></ul><ul><li>Synchronous RPC is a “remote function call” – client blocks and waits for return val </li></ul><ul><li>Asynchronous RPC is a “remote thread spawn” </li></ul>
    18. 18. Synchronous RPC
    19. 19. Asynchronous RPC
    20. 20. Asynchronous RPC 2: Callbacks
    21. 21. Wrapper Functions <ul><li>Writing rpc_call(foo.dll, bar, arg0, arg1..) is poor form </li></ul><ul><ul><li>Confusing code </li></ul></ul><ul><ul><li>Breaks abstraction </li></ul></ul><ul><li>Wrapper function makes code cleaner </li></ul><ul><ul><li>bar(arg0, arg1); //just write this; calls “stub” </li></ul></ul>
    22. 22. More Design Considerations <ul><li>Who can call RPC functions? Anybody? </li></ul><ul><li>How do you handle multiple versions of a function? </li></ul><ul><li>Need to marshal objects </li></ul><ul><li>How do you handle error conditions? </li></ul><ul><li>Numerous protocols: DCOM, CORBA, JRMI… </li></ul>
    23. 23. Transaction Processing Systems (We’re using the blue cover sheets on the TPS reports now…)
    24. 24. TPS: Definition <ul><li>A system that handles transactions coming from several sources concurrently </li></ul><ul><li>Transactions are “events that generate and modify data stored in an information system for later retrieval” * </li></ul>* http://en.wikipedia.org/wiki/Transaction_Processing_System
    25. 25. Key Features of TPS: ACID <ul><li>“ ACID” is the acronym for the features a TPS must support: </li></ul><ul><li>Atomicity – A set of changes must all succeed or all fail </li></ul><ul><li>Consistency – Changes to data must leave the data in a valid state when the full change set is applied </li></ul><ul><li>Isolation – The effects of a transaction must not be visible until the entire transaction is complete </li></ul><ul><li>Durability – After a transaction has been committed successfully, the state change must be permanent. </li></ul>
    26. 26. Atomicity & Durability <ul><li>What happens if we write half of a transaction to disk and the power goes out? </li></ul>
    27. 27. Logging: The Undo Buffer <ul><li>Database writes to log the current values of all cells it is going to overwrite </li></ul><ul><li>Database overwrites cells with new values </li></ul><ul><li>Database marks log entry as committed </li></ul><ul><li>If db crashes during (2), we use the log to roll back the tables to prior state </li></ul>
    28. 28. Consistency: Data Types <ul><li>Data entered in databases have rigorous data types associated with them, and explicit ranges </li></ul><ul><li>Does not protect against all errors (entering a date in the past is still a valid date, etc), but eliminates tedious programmer concerns </li></ul>
    29. 29. Consistency: Foreign Keys <ul><li>Database designers declare that fields are indices into the keys of another table </li></ul><ul><li>Database ensures that target key exists before allowing value in source field </li></ul>
    30. 30. Isolation <ul><li>Using mutual-exclusion locks , we can prevent other processes from reading data we are in the process of writing </li></ul><ul><li>When a database is prepared to commit a set of changes, it locks any records it is going to update before making the changes </li></ul>
    31. 31. Faulty Locking <ul><li>Locking alone does not ensure isolation! </li></ul><ul><li>Changes to table A are visible before changes to table B – this is not an isolated transaction </li></ul>
    32. 32. Two-Phase Locking <ul><li>After a transaction has released any locks, it may not acquire any new locks </li></ul><ul><li>Effect: The lock set owned by a transaction has a “growing” phase and a “shrinking” phase </li></ul>
    33. 33. Relationship to Distributed Comp <ul><li>At the heart of a TPS is usually a large database server </li></ul><ul><li>Several distributed clients may connect to this server at points in time </li></ul><ul><li>Database may be spread across multiple servers, but must still maintain ACID </li></ul>
    34. 34. Conclusions <ul><li>We’ve seen 3 layers that make up a distributed system </li></ul><ul><li>Designing a large distributed system involves engineering tradeoffs at each of these levels </li></ul><ul><li>Appreciating subtle concerns at each level requires diving past the abstractions, but abstractions are still useful in general </li></ul>

    ×