2. University of Pennsylvania
11/14/00 CSE 380 2
Introduction to Distributed Systems
Why do we develop distributed systems?
• availability of powerful yet cheap microprocessors (PCs,
workstations), continuing advances in communication technology,
What is a distributed system?
A distributed system is a collection of independent computers that
appear to the users of the system as a single system.
Examples:
• Network of workstations
• Distributed manufacturing system (e.g., automated assembly
line)
• Network of branch office computers
3. University of Pennsylvania
11/14/00 CSE 380 3
Advantages of Distributed Systems
over Centralized Systems
• Economics: a collection of microprocessors offer a better
price/performance than mainframes. Low price/performance ratio: cost
effective way to increase computing power.
• Speed: a distributed system may have more total computing power than a
mainframe. Ex. 10,000 CPU chips, each running at 50 MIPS. Not possible to
build 500,000 MIPS single processor since it would require 0.002 nsec
instruction cycle. Enhanced performance through load distributing.
• Inherent distribution: Some applications are inherently distributed. Ex. a
supermarket chain.
• Reliability: If one machine crashes, the system as a whole can still survive.
Higher availability and improved reliability.
• Incremental growth: Computing power can be added in small increments.
Modular expandability
• Another deriving force: the existence of large number of personal
computers, the need for people to collaborate and share information.
4. University of Pennsylvania
11/14/00 CSE 380 4
Advantages of Distributed Systems
over Independent PCs
• Data sharing: allow many users to access to a
common data base
• Resource Sharing: expensive peripherals like
color printers
• Communication: enhance human-to-human
communication, e.g., email, chat
• Flexibility: spread the workload over the
available machines
5. University of Pennsylvania
11/14/00 CSE 380 5
Disadvantages of Distributed Systems
• Software: difficult to develop software for
distributed systems
• Network: saturation, lossy transmissions
• Security: easy access also applies to secrete
data
6. University of Pennsylvania
11/14/00 CSE 380 6
Hardware Concepts
Taxonomy (Fig. 9-4)
MIMD (Multiple-Instruction Multiple-Data)
Tightly Coupled versus Loosely Coupled
Tightly coupled systems (multiprocessors)
o shared memory
o intermachine delay short, data rate high
Loosely coupled systems (multicomputers)
o private memory
o intermachine delay long, data rate low
7. University of Pennsylvania
11/14/00 CSE 380 7
Bus versus Switched MIMD
• Bus: a single network, backplane, bus, cable or other medium
that connects all machines. E.g., cable TV
• Switched: individual wires from machine to machine, with
many different wiring patterns in use.
Multiprocessors (shared memory)
– Bus
– Switched
Multicomputers (private memory)
– Bus
– Switched
8. University of Pennsylvania
11/14/00 CSE 380 9
Switched Multiprocessors
Switched Multiprocessors (Fig. 9-6)
for connecting large number (say over 64) of processors
crossbar switch: n**2 switch points
omega network: 2x2 switches for n CPUs and n memories,
log n switching stages, each with n/2 switches,
total (n log n)/2 switches
delay problem: E.g., n=1024, 10 switching stages from
CPU to memory. a total of 20 switching stages. 100
MIPS 10 nsec instruction execution time need 0.5 nsec
switching time
NUMA (Non-Uniform Memory Access): placement of
program and data
building a large, tightly-coupled, shared memory
multiprocessor is possible, but is difficult and expensive
9. University of Pennsylvania
11/14/00 CSE 380 10
Multicomputers
Bus-Based Multicomputers (Fig. 9-7)
easy to build
communication volume much smaller
relatively slow speed LAN (10-100 MIPS, compared to
300 MIPS and up for a backplane bus)
Switched Multicomputers (Fig. 9-8)
interconnection networks: E.g., grid, hypercube
hypercube: n-dimensional cube
10. University of Pennsylvania
11/14/00 CSE 380 11
Software Concepts
• Software more important for users
• Three types:
1. Network Operating Systems
2. (True) Distributed Systems
3. Multiprocessor Time Sharing
11. University of Pennsylvania
11/14/00 CSE 380 12
Network Operating Systems
loosely-coupled software on loosely-coupled hardware
A network of workstations connected by LAN
each machine has a high degree of autonomy
o rlogin machine
o rcp machine1:file1 machine2:file2
Files servers: client and server model
Clients mount directories on file servers
Best known network OS:
o Sun’s NFS (network file servers) for shared file
systems (Fig. 9-11)
a few system-wide requirements: format and meaning of
all the messages exchanged
12. University of Pennsylvania
11/14/00 CSE 380 13
NFS
NFS Architecture
• Server exports directories
• Clients mount exported directories
NSF Protocols
• For handling mounting
• For read/write: no open/close, stateless
NSF Implementation
13. University of Pennsylvania
11/14/00 CSE 380 14
(True) Distributed Systems
tightly-coupled software on loosely-coupled hardware
provide a single-system image or a virtual uniprocessor
a single, global interprocess communication mechanism,
process management, file system; the same system call
interface everywhere
Ideal definition:
“ A distributed system runs on a collection of
computers that do not have shared memory, yet looks
like a single computer to its users.”
14. University of Pennsylvania
11/14/00 CSE 380 15
Multiprocessor Operating Systems
(Fig. 9-12)
Tightly-coupled software on tightly-coupled hardware
Examples: high-performance servers
shared memory
single run queue
traditional file system as on a single-processor system:
central block cache
Fig. 9-13 for comparisons
15. University of Pennsylvania
11/14/00 CSE 380 16
Design Issues of Distributed Systems
• Transparency
• Flexibility
• Reliability
• Performance
• Scalability
16. University of Pennsylvania
11/14/00 CSE 380 17
1. Transparency
• How to achieve the single-system image, i.e., how to make a
collection of computers appear as a single computer.
• Hiding all the distribution from the users as well as the
application programs can be achieved at two levels:
1) hide the distribution from users
2) at a lower level, make the system look transparent to
programs.
1) and 2) requires uniform interfaces such as access to
files, communication.
17. University of Pennsylvania
11/14/00 CSE 380 18
Types of transparency
– Location Transparency: users cannot tell where hardware and
software resources such as CPUs, printers, files, data bases are
located.
– Migration Transparency: resources must be free to move from
one location to another without their names changed.
E.g., /usr/lee, /central/usr/lee
– Replication Transparency: OS can make additional copies of
files and resources without users noticing.
– Concurrency Transparency: The users are not aware of the
existence of other users. Need to allow multiple users to
concurrently access the same resource. Lock and unlock for
mutual exclusion.
– Parallelism Transparency: Automatic use of parallelism without
having to program explicitly. The holy grail for distributed and
parallel system designers.
Users do not always want complete transparency: a fancy printer
1000 miles away
18. University of Pennsylvania
11/14/00 CSE 380 19
2. Flexibility
• Make it easier to change
• Monolithic Kernel: systems calls are trapped and executed
by the kernel. All system calls are served by the kernel, e.g.,
UNIX.
• Microkernel: provides minimal services. (Fig 9-15)
1) IPC
2) some memory management
3) some low-level process management and scheduling
4) low-level i/o
E.g., Mach can support multiple file systems, multiple system
interfaces.
19. University of Pennsylvania
11/14/00 CSE 380 20
3. Reliability
• Distributed system should be more reliable than single
system. Example: 3 machines with .95 probability of being
up. 1-.05**3 probability of being up.
– Availability: fraction of time the system is usable.
Redundancy improves it.
– Need to maintain consistency
– Need to be secure
– Fault tolerance: need to mask failures, recover from
errors.
20. University of Pennsylvania
11/14/00 CSE 380 21
4. Performance
• Without gain on this, why bother with distributed systems.
• Performance loss due to communication delays:
– fine-grain parallelism: high degree of interaction
– coarse-grain parallelism
• Performance loss due to making the system fault tolerant.
21. University of Pennsylvania
11/14/00 CSE 380 22
5. Scalability
• Systems grow with time or become obsolete.
Techniques that require resources linearly in
terms of the size of the system are not scalable.
e.g., broadcast based query won't work for large
distributed systems.
• Examples of bottlenecks
o Centralized components: a single mail server
o Centralized tables: a single URL address book
o Centralized algorithms: routing based on complete
information