4. The Problem
• Consider this scenario,
– When the Large Hadron Collider in CERN would be
completed
– Speculated to provide scientists with a large amount of
data on the behavior of particles
– It poses a technological challenge because it would
produce over 10 Petabytes of data a year
– Almost equivalent to a stack of CDs 20 Km high!
5. What we have
• In a review, IBM has noted that
– over a 24-hour period a UNIX server is actually serving
less than 10 percent of its capacity
– For Mainframes which ironically are targeted towards
specific problems, this figure is about just 40 percent
usage.
– For Desktops and Laptops this figure is a meager 5
percent.
6. Problem Domain
• Ever increasing demand for huge source of computing energy
which can be used for power hungry scientific and business
processes.
• To utilize the already under-utilized resources which are
worth millions of monetary value. This computing power has
to be put to use, without sacrificing the availability, flexibility,
accessibility and many other factors
7. Introduction to Grid Computing
⮚A ‘Grid’ is an infrastructure for resource sharing.
⮚ Most Grids use the idle time of thousands or millions of computers
throughout the world.
8. Introduction to Grid Computing
⮚ Grid Computing
❑ Grid computing enables the sharing, selection, and aggregation
of geographically distributed heterogeneous resources for solving
large-scale problems in science, engineering, and commerce
9. Power Grid Analogy
• Electrical/Power Grid
– No worry about where the electricity comes from – simply plug your toaster to get
the electrical power
– Infrastructure that makes this possible is called "the power grid“ e.g. Transmission
lines, power stations, etc
– Pervasive: electricity is available essentially everywhere
– Utility: you ask for electricity and pay for it
• Grid
– No worry about where the computer power you are using comes from - simply plug
your computer in to the Internet to get the computer power
– Infrastructure that makes this possible is called "the Grid“ i.e. it links PCs, servers,
network elements
– Grid is to be pervasive – accessible through the web services/portal
– Utility: you ask for computer power or storage capacity you get it and pay for
10. What is Grid Computing?
• Carl Kesselman and Ian Foster wrote this definition in their
book "The Grid: Blueprint for a New Computing
Infrastructure":
"A computational grid is a hardware and software
infrastructure that provides dependable, consistent,
pervasive, and inexpensive access to high-end computational
capabilities."
11. What is grid Computing?
• IBM's Grid Computing has put forward the following
statement:
"Grid is the ability, using a set of open standards and
protocols, to gain access to applications and data,
processing power, storage capacity and other computing
resources over the Internet.
12. Evolution
• 1996-1999 – Experimentation and core grid
protocols – Globus Toolkit 1.0
• 1999 – Data grid and Globus Toolkit 2.0
• Medium-scale data management and analysis
• 2001 – OGSA with Globus Toolkit 3.0 and
integration with Web services and resource
virtualization. Plus number of higher level
services
• Problems – lack of common vocabulary,
common infrastructure formulation, common
intercommunication protocols and common
interfaces or APIs.
• New services include common vocabulary and
systemization
• 2003 onwards - more extensive
standardization and computing
Personal
devices
Super
computers
Local
cluster
computin
g
Local
Data
Grids
Enterprise
cluster/
grid
Partner
Grids
Global
Grids
EARLY
STAGE
SECON
D
STAGE
THIRD
STAGE
TIM
E
13. Types of Grid
• Computational grid
– Setting aside resources specifically for computing power
– Most of the machines are high-performance servers
• Scavenging grid
– Used with large numbers of desktop machines
– Machines are scavenged for available CPU cycles and other
resources
– Owners of the desktop machines are usually given control
over when their resources are available to participate in
the grid
14. Types of Grid
• Data grid
– Housing and providing access to data across multiple
organizations
– Users are not concerned with where this data is located as
long as they have access to the data
15. What Grid computing can do?
• Every computer can access the resources of every
other computer belonging to the network
• A scientist studying proteins, logs into a computer
and uses an entire network of computers to analyze data
• A businessman accesses his company's network through a PDA in order to forecast
the future of a particular stock
• An Army official accesses and coordinates computer resources on three different
military networks to formulate a battle strategy
• All of these scenarios have one thing in common: They rely on a concept called grid
computing
17. What Grid computing can do?
Parallel CPU capacity
• Potential for massive parallel CPU capacity - In addition to pure
scientific needs - industries such as the bio-medical field, financial
modeling, oil exploration, motion picture animation, and many others
• Applications have been written to use algorithms that can be
partitioned into independently running parts
• A CPU intensive grid application can be thought of as many smaller
“subjobs,” each executing on a different machine. Subjobs do not need
to communicate with each other - “scalable”
• Barriers often exist to perfect scalability
18. What Grid computing can do?
Virtual resources and virtual organizations for collaboration
• Enable and simplify collaboration among a wider audience and allow
heterogeneous systems to work together to form the image of a large virtual
computing system offering a variety of virtual resources
• Sharing starts in the form of files or data
• “Data grid”
– Large capacities than single system
– Such spanning can improve data transfer rates through the use of striping
tech
– Duplicate data to serve as a backup
– Sharing is not limited to files - resources, such as equipment, software,
services, licenses
Simple view of
heterogeneous, dispersed
resources due to
virtualization
19. What Grid computing can do?
Access to additional resources
• In addition to CPU and storage resources, a grid can provide access to
increased quantities of other resources and to special equipment
• Expensive licensed software
• Special devices like remote printers
• Special equipment
• Remote medical diagnostic
• Robotic surgery tools with two-way interaction from a distance
20. What Grid computing can do?
Resource balancing
• Can offer a resource balancing effect by
scheduling grid jobs on machines with low
utilization
• Larger peak loads are handled in two
ways:
– An unexpected peak can be routed to
relatively idle machines in the grid.
– If the grid is already fully utilized, the
lowest priority work being performed
on the grid can be temporarily
suspended or even cancelled Jobs are migrated to less busy parts of the grid to
balance loads
21. What Grid computing can do?
Reliability
• High-end conventional computing systems use expensive hardware to increase
reliability - greater cost, due to the duplication of high-reliability components
• Alternate approach - relies more on software technology than expensive hardware
• Grid management software can automatically resubmit jobs to other machines on
the grid when a failure is detected
• Real-time situations, multiple copies of the important jobs can be run on different
machines throughout the grid
• Autonomic computing – software that automatically heals problems in the grid
even before an operator or manager is aware of them
22. What Grid computing can do?
Management
• Offers management of priorities among
different projects
• Administrators can change any number of
policies that affect how the different
organizations might share or compete for
resources
• When maintenance is required, grid work can
be rerouted to other machines without
crippling the projects involved
Administrators can adjust policies to better
allocate resources
Autonomic computing - able to identify important trends throughout the grid,
informing management of those that require attention
23. Types of Resources
Computation
• computing cycles provided by the processors of the machines on the
grid
• three primary ways to exploit the computation resources
– simplest is to use it to run an existing application on an available
machine on the grid rather than locally
– use an application designed to split its work in such a way that the
separate parts can execute in parallel on different processors
– run an application, that needs to be executed many times, on many
different machines in the grid
• Scalability is a measure of how efficiently the multiple processors on a
grid are used
24. Types of Resources
Storage
• “data grid” - provides some quantity of storage for grid use
• Internal memory - temporary
• Secondary storage increases capacity, performance, sharing,
and reliability of data
• Any individual file or data base can span several storage
devices and machine
25. Types of Resources
Communications
• Includes communications within the grid and external to the
grid
• Bandwidth available for large data communications can often
be a critical resource
• External communication access to the Internet
• Redundant communication paths are sometimes needed to
better handle potential network failures and excessive data
traffic
26. Types of Resources
Software and licenses
• expensive software installation – not possible on every grid
machine
• licensing fees are significant - expenses for an organization
27. Types of Resources
Jobs and applications
• Jobs are programs which may compute
something, execute one or more system
commands, move or collect data, or
operate machinery An application is one or more jobs that are
scheduled to run on grid
May run parallel on different machines in the grid
Application is one or more jobs that are scheduled to run on machines in the
grid - results are collected and assembled to produce the answer
28. Grid software components
Management components
• Component - keeps track of the resources available to the
grid users/members
• Measurement components that determine both the
capacities of the nodes on the grid and their current
utilization rate at any given time – used for scheduling and
to determine the health of the grid -alerting
– Any fault
– Congestion
– Over-commitment
• Advanced grid management software can automatically
manage many aspects of the grid - known as autonomic
computing, or recovery oriented computing
29. Grid software components
Management components
• automatically recover from various kinds of grid failures and
outages - finding alternative ways to get the workload
processed
30. Grid software components
Distributed grid management
• Larger grids may have a hierarchical or other type of organizational topology
• The work involved in managing the grid is distributed to increase the scalability
• Job by central job scheduler submitted to a lower level scheduler that handles the
assignment to the specific machine
Submission software
• Any member machine of a grid can be used to submit jobs and queries to the grid
• This function may be implemented as a separate component installed on submission
nodes or submission clients in some grids
• When a grid is built using dedicated resources - separate submission software is usually
installed on the user’s PC or workstation
31. Grid software components
Donor software
• Some sort of identification and authentication procedure must be
performed before a machine can join the grid
• Certificate Authorities be used to establish and ensure the identity of
the donor machine as well as the users and the grid itself
• Possible to join the grid without any special authentication or possible
for any user to submit jobs to the grid - serious security problems
• The donor machine will usually have some sort of monitor
• Both the grid management and donor software must communicate
with each other to send the job or receive the result
32. Grid software components
Schedulers
• Job scheduling software locates a machine to run a grid job
submitted by a user
– Round-robin fashion
– More advanced scheduler - implement a job priority in
the queue
Communications
• A grid system may include software to help jobs
communicate with each other
• An application may split itself into a large number of sub jobs
• Sub jobs need to be able to locate other specific sub jobs,
establish a communication and send the appropriate data
33. Intergrid and Intragrid
Intragrid and intergrid
• A simple grid consists of just a few
machines - homogeneous systems in
one department
• Intra grid - heterogeneous machines
configuration - more types of
resources are available in multiple
departments but within the same
organization
• Inter grid - grid may grow to cross
organization boundaries and may be
used to collaborate on projects of
common interest
• Highest levels of security are usually
required in this configuration
34. Comparison with P2P Applications
– Grid – aggregating distributed high end machines like clusters. P2P
concentrates on low end systems such as PCs
– Grid targeted towards Scientific and Business domain applications
– Sharing of not only files but also IT resources in Grid
– In P2P the guarantee of availability and system connectivity is not
applicable
– P2P is an approach towards decentralization without any particular
efforts towards virtualization of single system image
– Grid is about bringing all the resources together necessarily with equal
commitment to sharing, and presenting a virtual single system image
35. Comparison with Clusters
• Clusters: collection of computing nodes providing processor (primarily) or
data sharing
• Clusters: connected using a high speed local interconnection network in a
local data centre
• Grid is large scale phenomenon, which is stretching over huge geographical
region
• Clusters are targeted more towards some specified number of users, and pre-
defined set of application, or might be in most generic form simply load
sharing systems
• Grid, on the other hand, has been designed for dynamic number of users and
applications
36. Comparison with Clusters
• Clusters surely presents a single system image, wherein the user can visualize the
complete collections of nodes as one Single Computer,
• Whereas Grid is about presenting a virtual image of single large computing device
• In case of clusters, the guarantee of service is much more pronounced as compared to
Grids.
• Nodes are expected to give there full resources and are fully devoted to the complete
system
• Whereas in Grids, computing nodes which are connected at required to give some or
more (as much available) of its computing power
• Cluster are (generally) homogenous collection of machines tightly coupled over a small
region
• Grids are complete opposite of this, and are heterogeneous collection of nodes, wherein
the heterogeneity is shadowed by middle layer applications
37. Comparison with Client/Server models
• Client/server models: distributed form of computing nodes stretched
across large geographical region and providing end-user service
(Web Services)
• These models are end-to-end oriented, session based and coupled
as group(s) of computers working in cycle and no where for the
same purpose.
• Grids are designed with a view towards a computing machinery
which presents a unified interface to the user
• For Client/Server the interface changes for each change of service
or server
38. Comparison with Distributed Computing
• Grid computing is a paradigm of distributed computing
• In Grid System: the user is not required to know anything about the
underlying topology or any individual nodes in particular.
• Distributed computing is about firing request on specific node(s)
• In Grid interaction is with the system as a whole and not with any
node(s) in particular.
• Resemblance is limited to the distributed nature.
39. Comparison with Web Services
• Protocols of www are open gen purpose to support distributed
resources but not coordinated
• Web is mainly focused on communication but grid computing
enables resource sharing and collaborative interplay towards the
common goal
• Web provides basic infrastructure for data exchange between two
different distributed applications whereas grid – aggregation of
high end resources for solving large scale problems
• Both hide the complexities of the system
• Web services are used to support the grid computing since web
has emerged as the standards – based approach for accessing
network applications
40. Conclusion
Grid computing appears to be a promising trend for three
reasons:
• Make more cost-effective use of a given amount of
computer resources
• Provides the way to solve problems that can't be
approached without an enormous amount of computing
power
• Suggests that the resources of many computers can be co-
operative to each other so that they are able to solve a
common problem
41.
42. Grid Computing
• Please visit this for understanding of Grid Computing
https://www.youtube.com/watch?v=m6C9gMgX62A
https://www.youtube.com/watch?v=esVzoSqQ1P
c
IBM Grid Computing Demo
>>>>>>>> Please visit more websites for understanding the
concepts of Grid Computing