Network and distributed systems
Upcoming SlideShare
Loading in...5

Network and distributed systems






Total Views
Slideshare-icon Views on SlideShare
Embed Views



0 Embeds 0

No embeds



Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
Post Comment
Edit your comment
  • Object marshalling: do you eagerly or lazily send pointed-to objects? (Eager can cut down on latency, but lazy saves bandwidth… 1 GB lists are too much to send. Maybe send out to a certain horizon of depth of object pointed-ness?) Error conditions: type errors / fn not found / version mismatches / network connectivity issues + Do you stop running RPC hosts? + Keep running and save results to a designated file? + If the client disconnects at some point after RPC host is finished, do we roll back our state changes?
  • ASK: Can the lock of B can be pushed to after we write to table A? (Yes) Go over why this still maintains isolation

Network and distributed systems Network and distributed systems Presentation Transcript

  • Lecture 3 – Networks and Distributed Systems CSE 490h – Introduction to Distributed Computing, Spring 2007 Except as otherwise noted, the content of this presentation is licensed under the Creative Commons Attribution 2.5 License.
  • Outline
    • Networking
    • Remote Procedure Calls (RPC)
    • Transaction Processing Systems
  • Fundamentals of Networking
  • Sockets: The Internet = tubes?
    • A socket is the basic network interface
    • Provides a two-way “pipe” abstraction between two applications
    • Client creates a socket, and connects to the server, who receives a socket representing the other side
  • Ports
    • Within an IP address, a port is a sub-address identifying a listening program
    • Allows multiple clients to connect to a server at once
  • Example: Web Server (1/3) The server creates a listener socket attached to a specific port. 80 is the agreed-upon port number for web traffic.
  • Example: Web Server (2/3) The client-side socket is still connected to a port, but the OS chooses a random unused port number When the client requests a URL (e.g., “”), its OS uses a system called DNS to find its IP address.
  • Example: Web Server (3/3) Server chooses a randomly-numbered port to handle this particular client Listener is ready for more incoming connections, while we process the current connection in parallel
  • What makes this work?
    • Underneath the socket layer are several more protocols
    • Most important are TCP and IP (which are used hand-in-hand so often, they’re often spoken of as one protocol: TCP/IP)
    Even more low-level protocols handle how data is sent over Ethernet wires, or how bits are sent through the air using 802.11 wireless…
  • IP: The Internet Protocol
    • Defines the addressing scheme for computers
    • Encapsulates internal data in a “packet”
    • Does not provide reliability
    • Just includes enough information for the data to tell routers where to send it
  • TCP: Transmission Control Protocol
    • Built on top of IP
    • Introduces concept of “connection”
    • Provides reliability and ordering
  • Why is This Necessary?
    • Not actually tube-like “underneath the hood”
    • Unlike phone system (circuit switched), the packet switched Internet uses many routes at once
  • Networking Issues
    • If a party to a socket disconnects, how much data did they receive?
    • … Did they crash? Or did a machine in the middle?
    • Can someone in the middle intercept/modify our data?
    • Traffic congestion makes switch/router topology important for efficient throughput
  • Remote Procedure Calls (RPC)
  • How RPC Doesn’t Work
    • Regular client-server protocols involve sending data back and forth according to a shared state
    Client: Server: HTTP/1.0 index.html GET 200 OK Length: 2400 (file data) HTTP/1.0 hello.gif GET 200 OK Length: 81494 …
  • Remote Procedure Call
    • RPC servers will call arbitrary functions in dll, exe, with arguments passed over the network, and return values back over network
    Client: Server: foo.dll,bar(4, 10, “hello”) “ returned_string” foo.dll,baz(42) err: no such function …
  • Possible Interfaces
    • RPC can be used with two basic interfaces: synchronous and asynchronous
    • Synchronous RPC is a “remote function call” – client blocks and waits for return val
    • Asynchronous RPC is a “remote thread spawn”
  • Synchronous RPC
  • Asynchronous RPC
  • Asynchronous RPC 2: Callbacks
  • Wrapper Functions
    • Writing rpc_call(foo.dll, bar, arg0, arg1..) is poor form
      • Confusing code
      • Breaks abstraction
    • Wrapper function makes code cleaner
      • bar(arg0, arg1); //just write this; calls “stub”
  • More Design Considerations
    • Who can call RPC functions? Anybody?
    • How do you handle multiple versions of a function?
    • Need to marshal objects
    • How do you handle error conditions?
    • Numerous protocols: DCOM, CORBA, JRMI…
  • Transaction Processing Systems (We’re using the blue cover sheets on the TPS reports now…)
  • TPS: Definition
    • A system that handles transactions coming from several sources concurrently
    • Transactions are “events that generate and modify data stored in an information system for later retrieval” *
  • Key Features of TPS: ACID
    • “ ACID” is the acronym for the features a TPS must support:
    • Atomicity – A set of changes must all succeed or all fail
    • Consistency – Changes to data must leave the data in a valid state when the full change set is applied
    • Isolation – The effects of a transaction must not be visible until the entire transaction is complete
    • Durability – After a transaction has been committed successfully, the state change must be permanent.
  • Atomicity & Durability
    • What happens if we write half of a transaction to disk and the power goes out?
  • Logging: The Undo Buffer
    • Database writes to log the current values of all cells it is going to overwrite
    • Database overwrites cells with new values
    • Database marks log entry as committed
    • If db crashes during (2), we use the log to roll back the tables to prior state
  • Consistency: Data Types
    • Data entered in databases have rigorous data types associated with them, and explicit ranges
    • Does not protect against all errors (entering a date in the past is still a valid date, etc), but eliminates tedious programmer concerns
  • Consistency: Foreign Keys
    • Database designers declare that fields are indices into the keys of another table
    • Database ensures that target key exists before allowing value in source field
  • Isolation
    • Using mutual-exclusion locks , we can prevent other processes from reading data we are in the process of writing
    • When a database is prepared to commit a set of changes, it locks any records it is going to update before making the changes
  • Faulty Locking
    • Locking alone does not ensure isolation!
    • Changes to table A are visible before changes to table B – this is not an isolated transaction
  • Two-Phase Locking
    • After a transaction has released any locks, it may not acquire any new locks
    • Effect: The lock set owned by a transaction has a “growing” phase and a “shrinking” phase
  • Relationship to Distributed Comp
    • At the heart of a TPS is usually a large database server
    • Several distributed clients may connect to this server at points in time
    • Database may be spread across multiple servers, but must still maintain ACID
  • Conclusions
    • We’ve seen 3 layers that make up a distributed system
    • Designing a large distributed system involves engineering tradeoffs at each of these levels
    • Appreciating subtle concerns at each level requires diving past the abstractions, but abstractions are still useful in general