Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Distributed system notes unit I


Published on

Introduction to distributed systems
Architecture for Distributed System, Goals of Distributed system, Hardware and Software
concepts, Distributed Computing Model, Advantages & Disadvantage distributed system, Issues
in designing Distributed System,

Published in: Engineering
  • Dating direct: ❶❶❶ ❶❶❶
    Are you sure you want to  Yes  No
    Your message goes here
  • Dating for everyone is here: ❶❶❶ ❶❶❶
    Are you sure you want to  Yes  No
    Your message goes here

Distributed system notes unit I

  1. 1. DS/UNIT 1 Trubacollege of Science & Tech., Bhopal Prepared by : Nandini Sharma(CSE Deptt.) Page 1 Distributed Computing System Over the past two decades, advancements in microelectronic technology have resulted in the availability of fast, inexpensive processors, and advancements in communication technology have resulted in the availability of cost-effective and highly efficient computer networks. The net result of the advancement in these two technologies is that the price performance ratio has now changed to favour the use of interconnected multiple processors in place of a single, high- speed processor. Computer architecture consisting of interconnected; multiple processors are basically of two types: 1. Tightly coupled systems: In these systems, there is a single system wide primary memory (address space) that is shared by all the processors. If any processor writes; for example, the value 100 to the memory location x, any other processor subsequently reading from location x will get the value 100. Therefore, in these systems, any communication between the processors usually takes place through the shared memory. 2. Loosely coupled systems: In these systems, the processors do not share memory, and each processor has its own local memory. If a processor writes the value 100 to the memory location x, this write operation will only change the contents of its local memory and will not affect the contents of the memory of any other processors. Hence, if another processor reads the memory location x, it will get whatever value was there before in that location of its own local memory .In these systems, all physical communication between the processors is done by passing messages across the network that interconnects the processors. Usually, Tightly coupled systems are referred to as parallel processing systems, and Loosely coupled systems are referred as distributed computing systems, or simply distributed systems. A distributed system is a collection of autonomous computers linked by a computer network that appear to the users of the system as a single computer. Some comments: • System architecture: the machines are autonomous; this means they are computers which, in principle, could work independently; • The user’s perception: the distributed system is perceived as a single system solving a certain problem (even though, in reality, we have several computers placed in different locations). By running a distributed system software the computers are enabled to: - coordinate their activities - share resources: hardware, software, data. Examples of Distributed Systems Network of workstations • Personal workstations + processors not assigned to specific users.
  2. 2. DS/UNIT 1 Trubacollege of Science & Tech., Bhopal Prepared by : Nandini Sharma(CSE Deptt.) Page 2 • Single file system, with all files accessible from all machines in the same way and using the same path name. • For a certain command the system can look for the best place (workstation) to execute it. Examples of Distributed Systems Automatic banking (teller machine) system • Primary requirements: security and reliability. • Consistency of replicated data. • Concurrent transactions (operations which involve accounts in different banks; simultaneous access. Examples of Distributed Systems Distributed Real-Time Systems • Synchronization of physical clocks
  3. 3. DS/UNIT 1 Trubacollege of Science & Tech., Bhopal Prepared by : Nandini Sharma(CSE Deptt.) Page 3 • Scheduling with hard time constraints • Real-time communication • Fault tolerance Distributed Computing System is basically a collection of processors interconnected by a communication network in which each processors has its own local memory and other peripherals, and the communication between any two processors of the system takes place by message passing over the communication network. Distributed Computing System Models Various models are used for building Distributed Computing System. These models can be broadly classified into five categories – minicomputer, workstation, workstation-server, processor pool, and hybrid. They are described below. 1. Minicomputer Model is a simple extension of the centralized time-sharing system. As shown in fig., a Distributed Computing System based on this model consists of a few minicomputer interconnected by a communication network. Each minicomputer usually has multiple users simultaneously logged on it. For this, several interactive terminals are connected to each minicomputer. Each user is logged on to one specific minicomputer, with remote access to other minicomputers. The network allows a user to access remote resources that are available on some machine other than the one on to which the user is currently logged. The minicomputer model may be used when resources sharing (such as sharing of information databases of different types, with each type of database located on a different machine) with remote users is desired. Example- ARPAnet is an example of a Distributed Computing System based on the minicomputer model. 2. Workstation Model as shown in fig., a Distributed Computing System based on the workstation model consists of several workstations interconnected by a communication network. A company’s office or a university department may have several workstation scattered throughout a building or campus, each workstations equipped with its own disk and serving as a single-user computer. It has been often found that in such an environment, at any one time, a significant proportion of the workstations are idle, resulting in the waste of large amounts of CPU time. Therefore, the idea of the workstation model is to interconnect all these workstation may be used to process jobs of users who are logged onto other workstations and do not have sufficient processing power at their own workstation to get their jobs processed efficiently. In this model, a user logs onto one of the workstation called his or her “home” workstation and submits jobs for execution. When the system finds that the user’s workstation does not have sufficient processing power for executing the processes of the submitted jobs efficiently, it transfers one or more of the processes from the user’s workstation to some other workstation that is idle and gets the process executed there, and finally the result of execution is returned to the user’s workstation. This model is not so simple to implements as it might appear at first sight because several issues must be resolved. These issues are as follows:  How does the system find the idle workstation?  How is a process transferred from one workstation to get it executed on another workstation?
  4. 4. DS/UNIT 1 Trubacollege of Science & Tech., Bhopal Prepared by : Nandini Sharma(CSE Deptt.) Page 4  What happens to a remote process if a user logs onto a workstation that was idle until now and was being used to execute a process of another workstation? Three commonly used approaches for handling the third issues are as follows:  The first approach is to allow the remote process share the resources of the workstation along with its own logged-on user’s processes. This method is easy to implement, but it defeats the main idea of workstation serving as personal computers, because if remote processes are allowed to execute simultaneously with the logged-on user’s own processes, the logged-on user does not get his or her guaranteed response.  The second approach is to kill the remote process. The main drawbacks of this method are that all processing done for the remote process gets lost and the file system may be left in an inconsistent state, making this method unattractive.  The third approach is to migrate the remote process back to its home workstation, so that its execution can be continued there. This method is different to implement because it require the system to support pre-emptive process migration facility. Example- The Sprite system and an experimental system developed at Xerox PARC. 3. Workstation-Server-Model is a network of personal workstations, each with its own disk and a local file system. A workstations with its own local disk is usually called a diskful workstations and a workstations without a local disk is called a diskless workstations. With the proliferation of high-speed networks, diskless workstations have become more popular in network environments than diskful workstations, making the workstations-server model more popular than the workstation model for building Distributed Computing Systems. As shown in fig., a Distributed Computing System based on the workstations server model consists of a few minicomputers and several workstations interconnected by a communication network. Advantages:  In general, it is much cheaper to use a few minicomputer equipped with large, fast disks that are accessed over the network than a large number of diskful workstations, with each workstations having a small, slow disk.  Diskless workstations are also preferred to diskful workstations from a system maintenance point of view. Back up and hardware maintenance are easier to perform with a few large disks than with many small disks scattered all over a building or campus. Furthermore, installing new release of software is easier when the software is to be installed on a few file server machines than on every workstation.  In the workstation-server model, since all files are managed by the file servers, users have the flexibility to use any workstation and access the files in the same manner irrespective of which workstation the user is currently logged on. Note that this is not true with workstation model, in which each workstation has its local file system, because different mechanisms are needed to access local and remote files.  In the workstations-server model, the request-response described above is mainly used to access the services of the server machines. Therefore, unlike the workstations model, this model does not need a process migration facility, which is difficult to implement.
  5. 5. DS/UNIT 1 Trubacollege of Science & Tech., Bhopal Prepared by : Nandini Sharma(CSE Deptt.) Page 5  A user has guaranteed response time because workstations are not used for executing remote processes. However, the model does not utilize the processing capability of idle workstations. Examples- V-system. 4. Processor-Pool Model is based on the observation that most of the time a user does not need any computing power but once in a while he or she may need a very large amount of computing power for short time. Therefore, unlike the workstation-server model in which a processor is allocated to each user, in the processor-pool model the processors are pooled together to be shared by the users as needed. The pool of processors consists of a large number of microcomputers and minicomputers attached to the network. Each processor in the pool has its own memory to load and run a system program or an application program of the distributed computing system. As shown in figure, in the pure processor-pool model, the processors in the pool have no terminals attached directly to them, and users access the system from terminals that are attached to the network via special devices. These terminals are either small diskless workstation or graphic terminals, such as X terminals. A special server manages and allocates the processors in the pool to different users on a demand basis. When a user submits a job for computation, an appropriate number of processors are temporarily assigned to his or her job by the run server. For example, if the user’s computation job is the compilation of a program having n segments, in which each of the segments can be compiled independently to produce separate re-locatable object files, n processors from the pool can be allocated to this job to compile all the n segments in parallel. When the computation is completed, the processors are returned to the pool for use by other users. In the processor- pool model there is no concept of a home machine. That is, a user does not log onto a particular machine but to the system as a whole. This is in contrast to other models in which each user has a home machine onto which he or she logs and runs most of his or her programs there by defaults. As compared to the workstation-server model, the processor-pool model allows better utilization of the available processing power of a distributed computing system. Example- Amoeba and the Cambridge distributed computing system. 5. Hybrid model – To combine the advantage of the workstation-server and processors-pool models, a hybrid model may be used to build a distributed computing system. The hybrid model is based on the workstation-server but with the addition of pool processors. The processors in the pool can be allocated dynamically for computations that are too large for workstations or to that require several computers concurrently for efficient execution. This model gives guaranteed response to interactive jobs by allowing them to be processed on local workstation of the users. It is more expensive to implement than the workstation-server and processors-pool models. Advantages & Disadvantages: 1. Inherently Distributed Applications – Distributed computing systems come into existence in some very natural ways. For Example, several applications are inherently distributed in nature and require a distributed computing system for their realization. For instance, in an employee database of a nationwide organization, the data pertaining to a particular employee are generated at the employee’s branch office, and in addition to the global need to view the entire database, there is a local need for frequent and immediate access to locally generated data at each branch office.
  6. 6. DS/UNIT 1 Trubacollege of Science & Tech., Bhopal Prepared by : Nandini Sharma(CSE Deptt.) Page 6 Examples – Worldwide airline reservations system, a computerized banking system in which a customer can deposit/withdraw money from their account from any branch of the bank, and factory automation system controlling robots and machines all along an assembly line. 2. Information Sharing among Distributed Users – It can be easily and efficiently shared by the users working at other nodes of the system. For Example - A project can be performed by two or more users who are geographically far off from each other but whose computers are a part of the same distributed computing system, although, the users are geographically separated from each other, they can work in cooperation. Example – By transferring the files of the project, logging onto each other’s remote computers to run programs, and exchanging messages by electronic mail to coordinate the work. 3. Resource Sharing – Information is not the only thing that can be shared in a distributed computing system. Sharing of software resources such as software libraries and database as well as hardware resources such as printers, hard disks, and plotters can also be done in a very effective way among all the computers and the users of a single distributed computing system. For Example – In a distributed computing system based on the workstation-server model the workstation may have no disk or only a small disk(10-20 megabytes) for temporary storage, and access to permanent files on a large disk can be provided to all the workstations by a single file server. 4. Better Price-Performance Ratio – With the rapidly increasing power and reduction in the price of microprocessors, combined with the increasing speed of communication networks, distributed computing systems potentially have a much better price-performance ratio than a single large centralized system. For Example – We saw how a small number of CPUs in a distributed computing systems based on the processor-pool model can be effectively used by a large number of users from inexpensive terminals, giving a fairly high price- performance ratio as compared to either a centralized time sharing system or a personal computer. 5. Shorter Response times and higher throughput - Distributed computing systems are expected to have better performance than single-processor centralized systems. The two most commonly used performance metrics are response time and throughput of user processes. That is, the multiple processors of distributed computing systems can be utilized properly for providing shorter response times and higher throughput than a single-processor centralized system. 6. Higher Reliability – refers to the degree of tolerance against errors and component failures in a system. A reliable system prevents loss of information even in the event of component failures. The multiplicity of storage devices and processors in distributed computing systems allows the maintenance of multiple copies of critical information within the system and the execution of important computations redundantly to protect them against catastrophic failures. 7. Extensibility and Incremental Growth - Distributed computing systems is that they are capable of incremental growth. That is, it is possible to gradually extend the power and functionality of distributed computing systems by simply adding additional resources to the system as and when the need arises.
  7. 7. DS/UNIT 1 Trubacollege of Science & Tech., Bhopal Prepared by : Nandini Sharma(CSE Deptt.) Page 7 8. Better flexibility in Meeting User’s Needs - Different type of computers is usually more suitable for performing different types of computations. For Example, computers with ordinary power are suitable for ordinary data processing jobs, whereas high-performance computers are more suitable for complex mathematical computations. Design Issues with Distributed Systems Design issues that arise specifically from the distributed nature of the application: • Transparency • Communication • Performance & scalability • Heterogeneity • Openness • Reliability & fault tolerance • Security Transparency ☞ How to achieve the single system image? ☞ How to "fool" everyone into thinking that the collection of machines is a "simple" computer? • Access transparency - Local and remote resources are accessed using identical operations. • Location transparency - Users cannot tell where hardware and software resources (CPUs, files, data bases) are located; the name of the resource shouldn’t encode the location of the resource. • Migration (mobility) transparency - Resources should be free to move from one location to another without having their names changed. Replication transparency - The system is free to make additional copies of files and other resources (for purpose of performance and/or reliability), without the users noticing. Example: several copies of a file; at a certain request that copy is accessed which is the closest to the client. • Concurrency transparency - The users will not notice the existence of other users in the system (even if they access the same resources). • Failure transparency - Applications should be able to complete their task despite failures occurring in certain components of the system. • Performance transparency - Load variation should not lead to performance degradation. This could be achieved by automatic reconfiguration as response to changes of the load; it is difficult to achieve.
  8. 8. DS/UNIT 1 Trubacollege of Science & Tech., Bhopal Prepared by : Nandini Sharma(CSE Deptt.) Page 8 Communication ☞Components of a distributed system have to communicate in order to interact. This implies support at two levels: 1. Networking infrastructure (interconnections & network software). 2. Appropriate communication primitives and models and their implementation: • Communication primitives: - send - receive - remote procedure call (RPC) • communication models - Client-server communication: implies a message exchange between two processes: the process which requests a service and the one which provides it; - Group muticast: the target of a message is a set of processes, which are me which are members of a given group. Performance and Scalability Several factors are influencing the performance of a distributed system: • The performance of individual workstations. • The speed of the communication infrastructure. • Extent to which reliability (fault tolerance) is provided (replication and preservation of coherence imply large overheads). • Flexibility in workload allocation: for example, idle processors (workstations) could be allocated automatically to a user’s task. Scalability The system should remain efficient even with a significant increase in the number of users and resources connected: - cost of adding resources should be reasonable; - Performance loss with increased number of users and resources should be controlled; - Software resources should not run out (number of bits allocated to addresses, number of entries in tables, etc.) Heterogeneity ☞Distributed applications are typically heterogeneous: - different hardware: mainframes, workstations, PCs, servers, etc.; - different software: UNIX, MS Windows, IBM OS/2, Real-time OSs, etc.; - unconventional devices: teller machines, telephone switches, robots, manufacturing systems, etc.; - diverse networks and protocols: Ethernet, FDDI, ATM, TCP/IP, Novell Netware, etc. Openness ☞One of the important features of distributed systems is openness and flexibility: - every service is equally accessible to every client (local or remote); - it is easy to implement, install and debug new services; - users can write and install their own services. ☞Key aspect of openness: - Standard interfaces and protocols (like Internet communication protocols) - Support of heterogeneity (by adequate middleware, like CORBA)
  9. 9. DS/UNIT 1 Trubacollege of Science & Tech., Bhopal Prepared by : Nandini Sharma(CSE Deptt.) Page 9 Reliability and Fault Tolerance ☞One of the main goals of building distributed systems is improvement of reliability. Availability: If machines go down, the system should work with the reduced amount of resources. • There should be a very small number of critical resources; critical resources: resources which have to be up in order the distributed system to work. • Key pieces of hardware and software (critical resources) should be replicated ⇒if one of them fails another one takes up - redundancy. Data on the system must not be lost, and copies stored redundantly on different servers must be kept consistent. • The more copies kept, the better the availability, but keeping consistency becomes more difficult. Fault-tolerance is a main issue related to reliability: the system has to detect faults and act in a reasonable way: • mask the fault: continue to work with possibly reduced performance but without loss of data/ information. • fail gracefully: react to the fault in a predictable way and possibly stop functionality for a short period, but without loss of data/information Security Security of information resources: 1. Confidentiality Protection against disclosure to unauthorised person 2. Integrity Protection against alteration and corruption 3. Availability Keep the resource accessible. The appropriate use of resources by different users has to be guaranteed. Distributed systems should allow communication between programs/users/ resources on different computers. Security risks associated with free access.