Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
DISTRIBUTED SYSTEM.docx
1. Distributed Systems
Unit 1
HISTORY
1945~1985
− Computers were large and expensive.
− No way to connect them.
− All systems were Centralized Systems.
Mid-1980s
− Powerful microprocessors.
− High Speed Computer Networks (LANs , WANs).
Then came the
DISTRIBUTED
SYSTEMS
What are Distributed Systems ? ?
A distributed system is a piece of software that
ensures that:
a collection of independent computers appears
to its users as a single coherent system.
Two aspects:
(1) independent computers and (2) single
system => middleware.
EXAMPLES
World Wide Web (WWW) is the biggest
example of distributed system.
Others are
The internet
An intranet which is a portion of the internet
managed by an organization
2. WHY DISTRIBUTED
SYSTEMS ? ?
availability of powerful yet cheap
microprocessors (PCs, workstations),
continuing advances in communication
Technology
ADVANTAGES OF D.S. OVER
CENTRALIZED SYSTEM:
Economics:
A collection of microprocessors offer a better
price/performance than mainframes. Low
price/performance ratio: cost effective way to
increase computing power.
Reliability:
· If one machine crashes, the system as a whole
can still survive. Higher availability and
improved reliability.
Speed: a distributed system may have more
total computing power than a mainframe.
Ex.: 10,000 CPU chips, each running at 50
MIPS. Not possible to build 500,000 MIPS
single processor.
Enhanced performance through load
distributing.
Incremental growth: Computing power
can be added in small increments. This
leads to Modular expandability
ADVANTAGES OF D.S. OVER
INDEPENDENT PCs:
Data sharing: allow many users to access
to a common data base.
Resource Sharing: expensive peripherals
like color printers.
Communication: enhance human-tohuman
communication. E.g.: email, chat.
Flexibility: spread the workload over the
available machines
ORGANIZATION OF D.S.:
A distributed system organized as middleware.
− The middleware layer extends over multiple machines, and offers each
application the same interface.
GOALS OF D.S. :
− Resource Sharing.
− Openness.
− Transparency.
3. − Scalability.
− Concurrency.
RESOURCE SHARING:
With Distributed Systems, it is easier for users to
access remote resources and to share resources
with other users.
− Examples: printers, files, Web pages, etc
A distributed system should also make it easier for
users to exchange information.
Easier resource and data exchange could cause
security problems – a distributed system should
deal with this problem.
OPENNESS:
The openness of DS is determined primarily
by the degree to which new resourcesharing
services can be added and be made
available for use by a variety of client
programs.
TRANSPARENCY:
It hides the fact that the processes and
resources are physically distributed across
multiple computers.
Transparency is of various forms as
follows:
SCALABILITY:
A system is described as scalable if it
remains effective when there is a significant
increase in the number of resources and the
number of users.
Challenges:
Controlling the cost of resources or money.
Controlling the performance loss.
4. CONCURRENCY:
There is a possibility that several clients
will attempt to access a shared resource at
the same time.
Any object that represents a shared resource
in a distributed system must be responsible
for ensuring that operates correctly in a
concurrent environment.
TYPES OF D.S. :
Distributed Computing Systems.
− Cluster Computing Systems.
− Grid Computing Systems.
Distributed Information Systems.
Distributed Pervasive Systems.
DISTRIBUTED COMPUTING
SYSTEMS:
Goal: High performance computing tasks.
Cluster Computing Systems:
− A “supercomputer” built from “off the
shelf” computer in a high-speed network
(usually a LAN)
− Most common use: a single program is run
in parallel on multiple machines
Grid Computing Systems:
− Contrary to clusters, grids are usually
composed of different types of computers
(hardware, OS, network, security, etc.)
− Resources from different organizations are
brought together to allow collaboration
− Examples: SETI@home, WWW…
Goal: Distribute information across several
Servers.
− Remote processes called Clients access the
servers to manipulate the information
− Different communication models are used.
The most usual are RPC (Remote Procedure
Calls) and the object oriented RMI (Remote
Method Invocations)
Often associated with Transaction systems
− Examples:
Banks;
Travel agencies;
Rent-a-Cars’;
Etc…
5. DISTRIBUTED PERVASIVE
SYSTEMS:
− These are the distributed systems involving mobile
and embedded computer devices like Small,
wireless, battery-powered devices (PDA’s, smart
phones, sensors, wireless surveillance cams,
portable ECG monitors, etc.)
− These systems characterized by their “instability”
when compared to more “traditional” distributed
systems
Pervasive Systems are all around us, and ideally
should be able to adapt to the lack of human
administrative control:
Automatically connect to a different network;
Discover services and react accordingly;
Automatic self configuration (E.g.: UPnP –
Universal Plug and Play)…
− Examples: Home Systems, Electronic Health Care
Systems, Sensor Networks, etc.
Goals of Distributed systems
There are various important goals that must be met to build a distributed
system worth the effort. A distributed system should easily connect users
to resources, it should hide the fact that resources are distributed across a
network, must be open, and must be scalable.
1. Connecting Users and Resources :
The main goal of a distributed system is to make it easy for users to
access remote resources, and share them with other users in a
controlled manner. Resources can be virtually anything, typical
examples of resources are printers, storage facilities, data, files,
web pages, and networks. There are many reasons for sharing
resources. One reason is economics.
2. Transparency :
An important goal of a distributed system is to hide the fact that its
process and resources are physically distributed across multiple
computers. A distributed system that is capable of presenting itself
to users and applications such that it is only a single computer
system is called transparent.
The concept of transparency can be applied to many aspects of a
distributed system as shown in the table.
1. Different Forms of Transparency –
6. S.No. Transparency Description
(1) Access Hide data representation.
(2) Location Hide location
(3) Migration Move place information.
(4) Relocation Hide moved place relocation.
(5) Replication Hide that a resource is replication.
(6) Concurrency Shared data bases access
(7) Failure Hide fact about resource failure.
(8) Persistence Hide fact about memorylocation.
1. Openness :
Another important goal of distributed systems is openness. An open
distributed system is a system that offers services in standards that
describable the syntax and semantics of those service instances,
standard rules in computer networks control the format, content,
and meaning of messages sent and received. Such rules are
formalized in the protocols. In distributed systems, services are
typically specified through interfaces, often called interface
definition languages (IDL). Interface definitions written in IDL almost
always capture only the syntax of services. They accurately specify
the names of functions that are available with the types of
parameters, return values, possible exceptions that can be raised
and so on.
2. Scalability :
The uncertain trend in distributed systems is towards larger
systems. This observation has implications for distributed file
system design. Algorithms that work well for systems with 100
machines can work for systems with 1000 machines and none at all
for systems with 10, 000 machines. for starters, the centralized
algorithm does not scale well. If opening a file requires contacting a
single centralized server to record the fact that the file is open then
7. the server will eventually become a bottleneck as the system
grows.
3. Reliability :
The main goal of building distributed systems was to make them
more reliable than single processor systems. The idea is that if
some machine goes down, some other machine gets used to it. In
other words, theoretically the reliability of the overall system can be
a Boolean OR of the component reliability. For example, with four
file servers, each with a 0.95 chance of being up at any instant, the
probability of all four being down simultaneously is 0.000006, so the
probability of at least one being available is (1-0.000006)=
0.999994, far better than any individual server.
4. Performance :
Building a transparent, flexible, reliable distributed system is
useless if it is slow like molasses. In particular application on a
distributed system, it should not deteriorate better than running
some application on a single processor. Various performance
metrics can be used. Response time is one, but so are throughput,
system utilization, and amount of network capacity consumed.
Furthermore, The results of any benchmark are often highly
dependent on the nature of the benchmark. A benchmark involves
a large number of independent highly CPU-bound computations
which give radically different results than a benchmark that consists
of scanning a single large file for same pattern.
Evolution of Distributed Computing Systems
In this article, we will see the history of distributed computing systems from the mainframe
era to the current day to the best of my knowledge. It is important to understand the history
of anything in order to track how far we progressed. The distributed computing system is all
about evolution from centralization to decentralization, it depicts how the centralized systems
evolved from time to time towards decentralization. We had a centralized system like
mainframe in early 1955 but now we are probably using a decentralized system like edge
computing and containers.
1. Mainframe: In the early years of computing between 1960-1967, mainframe-based
computing machines were considered as the best solution for processing large-scale data as
they provided time-sharing to a local clients who interacts with teletype terminals. This type
of system conceptualized the client-server architecture. The client connects and request the
server and the server processes these request, enabling a single time-sharing system to send
multiple resources over a single medium amongst clients. The major drawback it faced was
that it was quite expensive and that lead to the innovation of early disk-based storage and
transistor memory.
2. Cluster Networks: In the early 1970s, the development of packet-switching and cluster
computing happens which was considered an alternative for mainframe systems although it
8. was expensive. In cluster computing, the underlying hardware consists of a collection of
similar workstations or PCs, closely connected by means of a high-speed local-area network
where each node runs the same operating system. Its purpose was to achieve parallelism.
During 1967-1974, we also saw the creation of ARPANET and an early network that enabled
global message exchange allowing for services hostable on remote machines across
geographic bounds independent from a fixed programming model. TCP/IP protocol that
facilitated datagram and stream-orientated communication over a packet-switched
autonomous network of networks also came into existence. Communication was mainly
through datagram transport.
3. Internet & PC’s: During this era, the evolution of the internet takes place. New
technology such as TCP/IP had begun to transform the Internet into several connected
networks, linking local networks to the wider Internet. Thus, the number of hosts connected
to the network began to grow rapidly, therefore the centralized naming systems such as
HOSTS.TXT couldn’t provide scalability. Hence Domain Name Systems (DNSs) came into
existence in 1985 and were able to transform hosts’ domain names into IP addresses. Early
GUI-based computers utilizing WIMP(windows, icons, menus, pointers) were developed
which provided feasibility of computing within the home, providing applications such as
video games and web browsing to consumers.
4. World Wide Web: During the 1980 – the 1990s, the creation of HyperText Transfer
Protocol (HTTP) and HyperText Markup Language (HTML) resulted in the first web
browsers, websites,s, and web-server. It was developed by Tim Berners Lee at CERN.
Standardization of TCP/IP provided infrastructure for interconnected networks of networks
known as the World Wide Web (WWW). This leads to the tremendous growth of the number
of hosts connected to the Internet. As the number of PC-based application programs running
on independent machines started growing, the communications between such application
programs became extremely complex and added a growing challenge in the aspect of
application-to-application interaction. With the advent of Network computing which enables
remote procedure calls (RPCs) over TCP/IP, it turned out to be a widely accepted way for
application software communication. In this era, Servers provide resources described by
Uniform Resource Locators. Software applications running on a variety of hardware
platforms, OS, and different networks faced challenges when required to communicate with
each other and share data. These demanding challenges lead to the concept of distributed
computing applications.
5. P2P, Grids & Web Services: Peer-to-peer (P2P) computing or networking is a distributed
application architecture that partitions tasks or workloads between peers without the
requirement of a central coordinator. Peers share equal privileges. In a P2P network, each
client acts as a client and server.P2P file sharing was introduced in 1999 when American
college student Shawn Fanning created the music-sharing service Napster.P2P networking
enables decentralized internet. With the introduction of Grid computing, multiple tasks can
be completed by computers jointly connected over a network. It basically makes use of a data
grid i.e., a set of computers can directly interact with each other to perform similar tasks by
using middleware. During 1994 – 2000, we also saw the creation of effective x86
virtualization. With the introduction of web service, platform-independent communication
was established which uses XML-based information exchange systems that use the Internet
for direct application-to-application interaction. Through web services Java can talk with
Perl; Windows applications can talk with Unix applications.
9. 6. Cloud, Mobile & IoT: Cloud computing came up with the convergence of cluster
technology, virtualization, and middleware. Through cloud computing, you can manage your
resources and applications online over the internet without explicitly building on your hard
drive or server. The major advantage is provided that it can be accessed by anyone from
anywhere in the world. Many cloud providers offer subscription-based services. After paying
for a subscription, customers can access all the computing resources they need. Customers
no longer need to update outdated servers, buy hard drives when they run out of storage,
install software updates or buy a software licenses. The vendor does all that for them. Mobile
computing allows us to transmit data, such as voice, and video over a wireless network. We
no longer need to connect our mobile phones with switches. Some of the most common forms
of mobile computing is a smart cards, smartphones, and tablets. IoT also began to emerge
from mobile computing and with the utilization of sensors, processing ability, software, and
other technologies that connect and exchange data with other devices and systems over the
Internet.
The evolution of Application Programming Interface (API) based communication over the
REST model was needed to implement scalability, flexibility, portability, caching, and
security. Instead of implementing these capabilities at each and every API separately, there
came the requirement to have a common component to apply these features on top of the
API. This requirement leads the API management platform evolution and today it has become
one of the core features of any distributed system. Instead of considering one computer as
one computer, the idea to have multiple systems within one computer came into existence.
This leads to the idea of virtual machines where the same computer can act as multiple
computers and run them all in parallel. Even though this was a good enough idea, it was not
the best option when it comes to resource utilization of the host computer. The various
virtualization available today are VM Ware Workstation, Microsoft Hyper-V, and Oracle
Virtualization.
7. Fog and Edge Computing: When the data produced by mobile computing and IoT
services started to grow tremendously, collecting and processing millions of data in real-time
was still an issue. This leads to the concept of edge computing in which client data is
processed at the periphery of the network, it’s all about the matter of location. That data is
moved across a WAN such as the internet, processed, and analyzed closer to the point such
as corporate LAN, where it’s created instead of the centralized data center which may cause
latency issues. Fog computing greatly reduces the need for bandwidth by not sending every
bit of information over cloud channels, and instead aggregating it at certain access points.
This type of distributed strategy lowers costs and improves efficiencies. Companies like IBM
are the driving force behind fog computing. The composition of Fog and Edge
computing further extends the Cloud computing model away from centralized stakeholders
to decentralized multi-stakeholder systems which are capable of providing ultra-low service
response times, and increased aggregate bandwidths.
The idea of using a container becomes prominent when you can put your application and all
the relevant dependencies into a container image that can be run on any environment which
has a host operating system that can run containers. This concept became more popular and
improved a lot with the introduction of container-based application deployment. Containers
can act as same as virtual machines without having the overhead of a separate operating
system. Docker and Kubernetes are the two most popular container-building platforms. They
provide the facility to run in large clusters and communication between services running on
containers.
10. Today distributed system is programmed by application programmers while the underlying
infrastructure management is done by a cloud provider. This is the current state of distributed
systems of computing and it keeps on evolving.
DesignIssues of Distributed System
The distributed information system is defined as “a number of interdependent computers
linked by a network for sharing information among them”. A distributed information system
consists of multiple autonomous computers that communicate or exchange information
through a computer network. Design issues of distributed system –
1. Heterogeneity : Heterogeneity is applied to the network, computer hardware,
operating system and implementation of different developers. A key component
of the heterogeneous distributed system client-server environment is middleware.
Middleware is a set of services that enables application and end-user to interacts
with each other across a heterogeneous distributed system.
2. Openness: The openness of the distributed system is determined primarily by the
degree to which new resource-sharing services can be made available to the users.
Open systems are characterized by the fact that their key interfaces are published.
It is based on a uniform communication mechanism and published interface for
access to shared resources. It can be constructed from heterogeneous hardware
and software.
3. Scalability: Scalability of the system should remain efficient even with a
significant increase in the number of users and resources connected.
4. Security : Security of information system has three components Confidentially,
integrity and availability. Encryption protects shared resources, keeps sensitive
information secrets when transmitted.
5. Failure Handling: When some faults occur in hardware and the software
program, it may produce incorrect results or they may stop before they have
completed the intended computation so corrective measures should to
implemented to handle this case. Failure handling is difficult in distributed
systems because the failure is partial i, e, some components fail while others
continue to function.
6. Concurrency: There is a possibility that several clients will attempt to access a
shared resource at the same time. Multiple users make requests on the same
resources, i.e read, write, and update. Each resource must be safe in a concurrent
environment. Any object that represents a shared resource in a distributed system
must ensure that it operates correctly in a concurrent environment.
7. Transparency : Transparency ensures that the distributes system should be
perceived as a single entity by the users or the application programmers rather
than the collection of autonomous systems, which is cooperating. The user should
be unaware of where the services are located and the transferring from a local
machine to a remote one should be transparent.
11. The various models that are used for building distributed computing systems can be classified
into 5 categories:
1.Minicomputer Model
The minicomputer model is a simple extension of the centralized time-sharing system.
A distributed computing system based on this model consists of a few minicomputers
interconnected by a communication network were each minicomputer usually has
multiple users simultaneously logged on to it.
Several interactive terminals are connected to each minicomputer.Each user logged on
to one specific minicomputer has remote access to other minicomputers.
The network allows a user to access remote resources that are available on some
machine other than the one on to which the user is currently logged.The minicomputer
model may be used when resource sharing with remote users is desired.
The early ARPA net is an example of a distributed computing system based on the
minicomputer model.
2.Workstation Model
12. A distributed computing system based on the workstation model consists of several
workstations interconnected by a communication network.
An organization may have several workstations located throughout an infrastructure
were each workstation is equipped with its own disk & serves as a single-user
computer.
In such an environment,at any one time a significant proportion of the workstations
are idle which results in the waste of large amounts of CPU time.
Therefore,the idea of the workstation model is to interconnect all these workstations
by a high-speed LAN so that idle workstations may be used to process jobs of users
who are logged onto other workstations & do not have sufficient processing power at
their own workstations to get their jobs processed efficiently.
Example:Sprite system & Xerox PARC.
3.Workstation–Server Model
The workstation model is a network of personal workstations having its own disk & a
local file system.
A workstation with its own local disk is usually called a diskful workstation & a
workstation without a local disk is called a diskless workstation.Diskless workstations
have become more popular in network environments than diskful
workstations,making the workstation-server model more popular than the workstation
model for building distributed computing systems.
A distributed computing system based on the workstation-server model consists of a
few minicomputers & several workstations interconnected by a communication
network.
In this model,a user logs onto a workstation called his or her home
workstation.Normal computation activities required by the user's processes are
performed at the user's home workstation,but requests for services provided by special
servers are sent to a server providing that type of service that performs the user's
requested activity & returns the result of request processing to the user's workstation.
Therefore,in this model,the user's processes need not migrated to the server machines
for getting the work done by those machines.
13. Example:The V-System.
4.Processor–Pool Model:
The processor-pool model is based on the observation that most of the time a user
does not need any computing power but once in a while the user may need a very
large amount of computing power for a short time.
Therefore,unlike the workstation-server model in which a processor is allocated to
each user,in processor-pool model the processors are pooled together to be shared by
the users as needed.
The pool of processors consists of a large number of microcomputers &
minicomputers attached to the network.
Each processor in the pool has its own memory to load & run a system program or an
application program of the distributed computing system.
In this model no home machine is present & the user does not log onto any machine.
This model has better utilization of processing power & greater flexibility.
Example:Amoeba & the Cambridge Distributed Computing System.
5.Hybrid Model:
The workstation-server model has a large number of computer users only performing
simple interactive tasks &-executing small programs.
In a working environment that has groups of users who often perform jobs needing
massive computation,the processor-pool model is more attractive & suitable.
To combine Advantages of workstation-server & processor-pool models,a hybrid
model can be used to build a distributed system.
The processors in the pool can be allocated dynamically for computations that are too
large or require several computers for execution.
The hybrid model gives guaranteed response to interactive jobs allowing them to be
more processed in local workstations of the users
Introduction to Distributed Computing Environment (DCE)
14. The Benefits of Distributed Systems have been widely recognized. They are due to their
ability to Scale, Reliability, Performance, Flexibility, Transparency, Resource-sharing, Geo-
distribution, etc. In order to use the advantages of Distributed Systems, appropriate support
and environment are needed that supports execution and development of Distributed
Applications.
A distributed application is a program that runs on more than one machine and communicates
through a network. It consists of separate parts that execute on different nodes of the network
and cooperate in order to achieve a common goal. It uses Client-Server Model.
Distributed Computing Environment(DCE) is an integrated set of services and tools which
are used for building and running Distributed Applications. It is a collection of integrated
software components/frameworks that can be installed as a coherent environment on top of
the existing Operating System and serve as a platform for building and running Distributed
Applications.
Using DCE applications, users can use applications and data at remote servers. Application
programmers or clients need not be aware of where their programs will run or where the data
that they want to have access, will be located.
DCE was developed by the Open Software Foundation(OSF) using software technologies
contributed by some of its member companies which are now popularly known as The Open
Group.
DCE framework/Services include:
Remote Procedure Call(RPC): It is a call made when a Computer program
wants to execute a subroutine in a different computer(another computer on a
shared network).
Distributed File System(DFS): It provides a transparent way of accessing a file
in the system in the same way as if it were at the same location.|
Directory Service: It is used to keep track location of Virtual Resources in the
Distributed System. These Resources include Files, Printers, Servers, Scanner,
and other machines. This service prompts the user to ask for resources(through
the process) and provide them with convenience. Processes are unaware of the
actual location of resources.
Security Service: It allows the process to check for User Authenticity. Only an
authorized person can have access to protected and secured resources. It allows
only an authorized computer on a network of Distributed Systems to have access
to secured resources.
Distributed Time Service: Inter-Process Communication between different
system components requires synchronization so that communication takes place
in a designated order only. This service is responsible for maintaining a global
clock and hence synchronizing the local clocks with the notion of time.
Thread Service: The Thread Service provides the implementation of lightweight
processes (threads). Helps in the synchronization of multiple threads within a
shared address space.
15. DCE Architecture
DCE supports the structuring of distributed computing systems into so-called cells which
consist of 3 types of machines, User, administrator, and Server. This is done to keep the size
of the administration domain manageable. A cell is basically a set of nodes that are managed
together by one authority.
Cell boundaries of a cell represent security firewalls; access to resources in a foreign cell
requires special authentication and authorization procedures that are different from secure
intra-cell interactions.
The highest privileges within a cell are assigned to a role called DCE
cell administrator which has control over all system services within the network, remotely.
It has privileges over all resources within a Distributed Computing Environment cell.
Major components of cell:
Security Server which is responsible for User Authenticity
Cell Directory Server(CDS) – the repository of resources
Distributed Time Server – provides the clock for synchronization of the entire
cell.
figure 1: DCE Architecture
Advantages of DCE:
Security
Lower Maintenance Cost
Scalability and Availability