Distributed Systems - An introduction
- Features of Distributed Systems,
- Distributed Shared Memory,
- Computer systems developed from:
- to direct communications between two machines,
- to networks: one machine can communicate with any other networked machine.
- In all cases the user is always aware of the connection between machines,
- Must issue explicit commands for the movement of data.
- Distributed Systems (DS) build on the networking layer:
- Groups of independent machines acting together as one
- Cooperate on a task, not just data sharing,
- Distribute computation among several physical machines.
- Distribution should be transparent to user AND programs at system call interface:
- User or programmer should not be able to tell that a remote machine is involved,
- IDEALLY DS should look like a conventional system to users:
- Appear to users as a single computer .
- Simplest possible distributed system architecture:
- Some processes provide services to others, known as servers,
- Processes which use those services are called clients.
- This approach known as client/server system.
- What the sever does and any data it returns to the client varies depending on system requirements,
- ALL different models of distributed computing can be reduced to this simple model.
Features of Distributed Systems(1)
- DS have features not found in standalone systems:
- Price/performance ratio favours multiple small machines:
- Especially commodity components,
- See Top500 list in later slide,
- Cumulative performance of micro computers at fraction of the cost of a main frame.
- Utilise spare CPU cycles by dynamically using idle workstations. How many PCs in the Universities Labs? In effect free processing power!!!
- Adapt to increased load and not collapse,
- BUT more processors = more communications?: ISSUES HERE?!
Features of Distributed Systems (2)
- High reliability and fault tolerance (through redundancy, user need never know a problem occurred!).
- Economic: Share an expensive device (i.e Radio Telescope)
- Convenience: not convenient to share a company’s database via floppy disk: distribution and data update problems!!!
- Not necessary to buy all processing power, memory, storage all at one time,
- System expands to keep pace with growing demand.
Example: Top 500 List (1)
- Example of price/performance mentioned earlier.
- Clusters are a type of distributed system.
- Top500 List: lists sites operating the 500 most powerful computer systems ( http://www.top500.org/ ):
- Entry to top 10 positions requires > 9.8 TFlop/s (trillions of calculations per second).
- #1: BlueGene/L DD2 beta, IBM/DOE, USA, 70.72 Tflop/s:
- 32,768 0.7GHz PowerPC 440 CPUs,
- #2: SGI Altix, Voltaire Infiniband, NASA, USA, 51.87 Tflop/s:
- 10,160 1.5 GHz SGI Altix CPUs,
- #3: Earth Simulator, Japan (Climate Modelling): 35.86 TFlop/s
- 5,120 500 MHz NEC Vector CPUs, (640 8 CPU nodes), 10 TB memory, 16GB/s inter node bandwith,
- Was #1 from 6/2002 until 11/2004 when BlueGene/L DD2 took over.
Example: Top 500 List (2)
- Highest ranking Intel cluster (at the moment):
- #10: NCSA, Urbana-Champaign, USA,
- The same that you might buy for your own machines (Intel Xeon),
- Myrinet probably not sitting on the shelf at your local PC World!
- Economies of scale (COTs production):
- Operating System issues: How to coordinate various distributed components into a coherent system?!
A Top 500 contender?
- Need to uniquely identify all resources
- Individual machines, processes, files, printers, etc,
- At system level identified by binary numbers,
- Client needs to know identity of server machine AND identifier of process providing a service:
- If server crashes and restarts: process may have a different process ID. Client unable to reach it.
- If server machine crashes, another machine may take over servicing client requests. BUT client continues sending requests to old server’s binary address…
- At human level provide meaningful resource names:
- i.e. LaserPrinter1 instead of 192.168.1.7 or the binary network address representation.
- At machine level still identified by binary numbers,
- Binding: The link between name and number,
- Maintains a database of bindings,
- Translates names to binary (or IP) addresses for the client,
- If binary identifiers change then name server only needs updating,
- If a client can locate the name server, it can locate any other resources in the system.
- A single name can be used to reference a number of servers: Server fail-over, load-balancing, etc…
- Two different types of underlying operating system have developed for use in distributed environments:
- Network Operating System (NOS),
- Distributed Operating System.
Network Operating System
- Attach file systems from a remote server onto a local machine:
- Remote file system appears part of the local directory structure,
- User sees no difference between local and remote files.
- NOS only transfers portions (blocks) of a file that are actually in use,
- If a file is modified: changes are written back to the server,
- NOS allows other resources like printers to appear local to the client.
Distributed Operating System
- True distributed OS must (at very least) begin to blur the boundaries between machines,
- Still responsible for managing local resources:
- CPU, network interface, etc.
- But also responsible for:
- Advertising resources to clients,
- Export/import/schedule processes to/from other machines.
- Should do all this TRANSPARENTLY!
- Remote Procedure Calls (RPC) provide a way to transmit data between processes on different machines transparently:
- Hides underlying socket communications.
- Fully distributed OS not yet in wide spread use.
- Usually single machine OS specially adapted:
- BUT not fully transparent.
- A number of approaches proposed as basis for future distributed operating systems
- I.e. CORBA, Distributed Computing Environment
- But what about other middleware: Jini, Jxta, Web Services?
Distributed Shared Memory
- Allow memory to be shared by processes on different machines.
- Allows a shared memory programming model to be used by cooperating processes in distributed systems:
- Using this model a standalone system could be distributed with minimum effort.
- Transparent to the programmer:
- Location of processes not relevant to programmer,
- Can be on local or remote machines.
Some Issues (1)…
- Deadlock, and mutual exclusion, etc become more complicated in a distributed environment!
- Distributed File systems:
- Issues: data availability, transparency, caching, replication, consistency, ..
- Underlying network, servers, processes may fail when least expected: redundancy, replication
- How to keep the system operating to maximum effect in the face of adverse conditions?
- Check pointing of computations?
- Transactions must always be in a known state. How to ensure this?
- Does performance continue to grow when more machines are added or is there a saturation point? How do we overcome this?
- A very brief look at some high-level issues associated with distributed (operating) systems.
- Price, performance,availability, redundancy, reliability, sharing, transparency, scalability.