Coordinates resources that are not subject to centralized control (not for each single node)
Uses standard, open, general-purpose protocols and interfaces.
Provide high quality of services
Reference: What is the Grid ? By Ian Foster
Grid: A Virtual Organization
Grid resource sharing paradigm has greater scope than P2P system. Grid implicitly allow direct access to computers, software, data and any other resources.
Both providers and consumers define clearly what they will share, who can share and conditions under which sharing will take place.
A set of individuals and/or institutions defined by such sharing rules form what we call Virtual Organization.
Grid: An Evolution, not revolution Source: IBM Grid Computing
Grid can be seen as the latest and most complete evolution of more familiar
Like the Web:
Grid keeps complexity hidden: multiple users enjoy a single unified experience.
Unlike the Web:
enables full collaboration toward real business goal.
It allows user to share files.
Not only files, but everything which could be shared .
Like Clusters and distributed computing
It bring computing resource together.
Unlike Clusters and distributed Computing
Grid can be geographically distributed and heterogeneous.
Like Virtualization technologies
enables virtualization of IT resources.
Unlike Virtualization technologies
It can enable virtualization of vast and disparate resources.
Originally Targeted Applications
What types of applications will grid be used for ?
NetSolve, large archives
Sloan Digital Sky Survey, Weather forecasting
Insors, GriPhyN, SciRUN
Grid Problem Defined:
Grid problem is defined as “Coordinated resource sharing and problem solving in dynamic, multi-institutional virtual organizations”.
The sharing raises many issues which were not addressed by distributed computing for example
How to structure flexible transient relationships.
How to structure fine grained access control over resources taking care of local and global policies.
How to agree on quality of service, scheduling and co-allocation.
Top 500 Supercomputers (June 2003) Earth Simulator: NEC : Yokohama : 35.86 TFlops ASCI Q: LANL: Los Alamos: HP Alphaserver SC: 13.88 TFlops MCR Linux Cluster: LLNL Livermore, 7.634 TFlops ASCI White: LLNL, Livermore IBM SP Power3, 7.304 TFlops Seaborg: NERSC/LBNL, Berkeley, IBM SP Power3, 7.303 TFlops Source : http://www.top500.org
Latest News Nov 8,2003
Virginia Tech. Big Mac replaced 3 rd position. It consists of 1100 Macintosh PCs and performed 17 TFlops.
General highlights from Top 500 (June 2003)
157 systems reported to have peak performance above 1 TFlops.
Total accumulated performance is 375 TFlops. ( up from 293 TFlops )
Entry level performance is 245.1 GFlops. (Up from 195.8)
A Total of 119 systems (up from 56) uses Intel processors.
149 systems are now labeled as clusters ( up from 53 )
23 of them are self-made ( Up from 14 )
Among top 10, 7 from US, 2 from Japan, 1 from France.
Economics and Control
The infrastructures are very expensive and require years of hard work.
The shear force of economics will require that these resources are under strict control and are optimally utilized.
Many times freedom is costly and chaotic.
This is the starting what we call Grid Computing
Changing face of Enterprise Computing
Most of the recent, enterprise systems are collection of heterogeneous resources.
Quality of services traditionally associated with mainframe centric computing are now essential to the effective conduct of e-business across distributed resources, inside as well as outside the enterprise.
Recently there is upsurge of services providers of various types such as web-hosting SP, storage SP, application SP
All these require standardization.
Bird’s Eye view
In the next few slides, we will get some broader picture followed by technical details.
Web Services Architecture Universal Description, Discovery and Integration (UDDI) allows us to find Web Services which meet certain requirements. Web Services Description Language Web-Services must be Self-describing and should Tell the invoker about operations it supports and How to invoke it. Simple Object Access Protocol Message passing between client and server using SOAP. Note: UDDI, WSDL, SOAP and HTTP are just an examples. Different implementations can use different technologies.
A Typical Web Service Invocation:
End User’s perspective
Stateless machines The above model is stateless. It can not remember what is done from one invocation to another. One client can mess up the another clients operations.
The concept of factories solves the problems mentioned earlier.
Make Grid Stateful Machine
Create transient services
Web Service Application: Client and Server stubs are generated automatically from the specifications.
Service: A service is a network-enabled entity that provides a specific capability. ( example: the ability to move files, create processes or verifying access rights.
Service = protocols + behavior
Grid services are defined by OGSA ( Open Grid Services Architecture). (OpenGrid Forum)
Grid services are specified by OGSI ( Open Grid Services Infrastructure)
Globus Toolkit is the most popular open implementation of OGSA.
Major Players in Grid Service World
Example from NetSolve
Suppose you want to multiply Matrix A and Matrix B. There is one site which provides the facility. You may want to directly integrate the function in your software.
request = netsolve( “matmul”, a, b)
C = netsolve( “wait”, request)
Nature of Grid Architecture
Grid architecture is a set of protocols for establishment, management and usage of dynamic, cross-organizational virtual organizations.
The main issues in the architecture are
Application Programming Interface( API) and Software Development Kits (SDK)
Narrow neck of glass defines
a small set of core abstractions
and protocols. It consists of
These protocols must be chosen
so as to capture the fundamental
mechanism of sharing across
many different types.
Grid Architecture Fabric layer implements the local, resource Specific operations that occurs on specific Resources. Connectivity protocols are concerned with communication and authentication. Resource protocols are concerned with negotiating access to individual resources Collective protocols and services are concerned with coordinating use of multiple resources.
General list of services
Identity & Authentication
Authorization & policy
High-Speed data transfer
Remote data access
Accounting and payment
At the minimum the following resource should be
available for query
Mechanism for starting program, monitoring and controlling the execution, advanced reservations, hardware and software characteristics, state information such as current load etc.
Mechanism for putting and getting files, state information such as available space and bandwidth utilization.
Mechanism for control over resource allocation for network transfer, information about network characteristics and load
Management for versioned source and object code. ( CVS style)
This layer defines core communications and authentication protocols.
Communication protocols enable the exchange of data between different fabric layers. It include transport, routing and naming services.
Authentications protocols build on communication services to provide cryptographically secure mechanisms for verifying the identity of users and resources.
Single sign on
Single “log on” should be sufficient for access to multiple grid resources.
run a program on user’s behalf.
Integration with local security
example : Kerberos or Unix security
User-based trust relationships.
If an user uses services from multiple service providers at the same time, the security mechanism should not require that each of the resource providers to cooperate and interact with each other.
It is built on top of communications. It defines protocols for
Payment for sharing resources.
Information protocols are used for obtaining information about structure and state of a resource. ( current load, usage policy, configuration etc)
Management protocols are used to negotiate access to shared resource, specifying resource requirements
Quality of service.
Operations to perform
Collective: Coordinating Multiple Resources
A user may query for resource by name and/or by its attributes such as type, availability, load.
Co-allocation, scheduling and brokering services
allow VO participants to request for some specific resources for some specific purpose and duration.
Monitoring and Diagnostic services
allows monitoring for resource failure, attacks, overload etc…
Data replication services
allows management of VO storage to maximize data access performance with respect to some metric such as response time, reliability and cost.
Grid-enabled programming systems
enable familiar programming models to be used in Grid environment using other grid services such as resource discovery, security etc. etc.
example: Globus MPI
Workload management and collaboration
Allow problem solving environment.
allows selection of the best software implementations and execution platform. Example NetSolve and Ninf
Accounting and payment services:
gather usage information for the purpose of accounting, payment for the services.
Build on both Grid and Web-Services communities, OGSA defines uniform service semantic called Grid Services .
OGSA defines few persistent and many transient services
OGSA defines interfaces for managing Grid service instances.
OGSA also defines WSDL interface and associated convention.
Protocols for reliable and secure management of distributed state.
Need for service oriented view
It allows us to address the need for standard interface definition, local/remote transparency and adaptation to local OS.
It allows multiple protocols bindings to facilitate localized optimization of services.
It simplify virtualization which in turn also allows consistent resource access multiple heterogeneous platform.
With service oriented view, we can partition the interoperability into two sub-problems, namely the definition of service interface and identification of protocols that can be used to invoke a particular interface
Globus toolkit is an open-architecture and open-source set of services and software libraries that support Grid and Grid applications.
This toolkit address issues of security, information discovery, resource management, data management, communication, fault detection and portability.
GRAM: Grid Resource Allocation and Management
MDS : Meta Directory Service
GSI : Grid Security Infrastructure
This toolkit will be described in detail in the next presentation, therefore I will skip any more description.
Nature of Service
Services are location transparent.
Services are created and destroyed dynamically.
Services are stateful. Every service is assigned a globally unique name, called Grid Service Handle (GSH)
Grid services can change during their lifetime ( for example support new protocols).
Web services are the basis for Grid services which are the cornerstones of OGSA and OGSI.
Web Services use simple Internet based protocols to address heterogeneous distributed computing.
Web Services define a technique for describing software components to be accessed, methods for accessing them and discovery about the components.
Web Services are language, programming model and system software neutral.
Presently, this word has been over-used and become a buzz-word.
There is distinction between website and web services. Although web services rely on web-technologies, they have no relation to web browsers and HTML.
Website is for humans, Web services are for software.
RMI, CORBA, EJB etc etc are oriented towards highly coupled distributed systems, where the client and servers are dependent on each other, web-services are oriented towards loosely coupled systems, where the client might have no prior knowledge of the Web Service until it actually invokes it.
Web services: Advantages and disadvantages
Web services are platform and language independent, since they use rely on XML.
Most Web services use HTTP for transmitting messages, and most of the internet proxies and firewalls do not mess with HTTP traffic.
Overhead are high: Transmitting XML is expensive. No real-time application will use web service using this model.
Lack of versatility : Currently provide basic services compared to CORBA
Service Lifetime Management
Who terminates transient state services ?
In normal circumstance, the request from the service invoker, but in distributed machine it is difficult. Component may fail, messages may be lost.
OGSA solves this problem using Soft State. Every service is created with a specified lifetime which can be extended by the request from client or other grid service. If no request is received, service is automatically terminated.
Soft state lifetime management avoids
Explicit client teardown of complex state
Resource “leaks” in hosting environment.
OGSA has SetTermination operation within GridService interface.
The use of absolute time in lifetime management implies existence of global clock that is well synchronized.
Network Time Protocol (NTP) provide standardized mechanisms for clock synchronization ( Up-to tens of milliseconds)
Services within the complex systems must be independently upgradeable.
Versioning and compatibility between services must be managed and expressed so that clients can discover not only the specific service versions but also compatible services.
OGSA defines conventions that allow us to identify when a service changes and when those changes are backwardly compatible with respect to interface and semantics.
OGSA notification framework allow clients to register interest in being notified of particular message using asynchronous, one-way delivery.
OGSA defines common abstraction and interfaces for NotificationSource and NotificationSink
Some myths ( misunderstanding ) about Grid Computing
Grid is next generation Internet.
The grid is a source of free cycles.
Grid requires a distributed operating system.
Grid requires a new programming model.
Grid makes high-performance computing superfluous.
Distributed Computing Economics (Views of Jim Gray)
An equivalent price for following items:
one data base access
10 bytes of internet traffic
10 bytes of disk storage
a megabyte of disk bandwidth
Break-even point is 10,000 instructions / byte.
This serves a basis how we do cost-effective Internet-based computing, such as grid computing.
How are the numbers computed?
A 2GH CPU with 2 GB RAM box: $2,000
A 200 GB disk,100 accesses/s, or 50MB/s: $200
1 Mbps WAN link: $100/month
$1 is equivalent to:
3.24 GB sent over WAN (7.2 hours)
100+ Tera CPU instructions = 7.2 hours of CPU time
1 GB disk
2.592 million database accesses (in 7.2 hours)
1.296 Tera Byte disk bandwidths (in 7.2 hours)
Cycle-based Computing is Almost Free
The accumulated cycles in SETI@Home are 54 Teraflops.
Google freely provides a trillion searches a year from the largest database (2 peterbytes).
Hotmail freely carries a trillion e-mails per year.
Amazon.com offers a free book search tool.
Many well-known media sites offer free news …
The maintenance prices paid are low and worthy.
What is SETI@Home?
It uses millions of computers in homes/offices world wide to analyze radio signals from space.
SETI: Search for Extraterrestrial Intelligence is to detect intelligent life outside Earth.
Uses radio telescope to listen (collect) for narrow-bandwidth radio signals from space.
Data analysis : (1) computing power spectrums, (2) finding ``candidate signals”, (3) eliminating meaningless signals.
Embarrassing Parallelism : CPU and Data Intensive, but infrequent communications. (high bandwidths interconnects in supercomputers are not necessary!)
Who are paying the``free” Computing
Advertisers pay it.
Google, hotmail, amazon.com collect $1 from a company for profits if its site is visited 1,000 time via these ``free” services: Cost Per thousand iMpressions ( CPM ).
Big companies are eager to pay maintenance.
Low cost but very effective promotion.
A Web site almost becomes the only ``Spoke-man”.
SETI@Home rely on donated cycles world wide.
It provided a 1,300 years of free computing on 2/3/03.
Cases for Grid Computing : at least 10,000 Ins/Byte
A cryptographic search problem:
only a few Kbytes input/output, but computing for days.
A representative job submitted to SETI@Home:
computing on 12 hours on 1/2 Mbytes of input
A CFD computation at Cornell:
7 years computing for 100 MB of input, 10 GB output.
Making animated movie of Toy Story :
a 200 MB image to take several hours to render. (200,000-600,000 Ins/Byte).
Grid Computing Should Follow the Economics
Suitable Applications can be very limited.
A good solution : to send a GB over Internet to save years of computing. It is not economic to send a KB if the result can be computed locally in a second.
If Internet cost drops slower than Moore’s Law, the analysis becomes stronger.
Over the 40 years, network cost fallen much slower.
Cluster computing has different economics
a GBps Ethernet costs $200/port, delivers 50 MBps
it is comparable to disk bandwidth cost , 10,000 lower than Internet costs. (so the CFD fits better on clusters).
Opportunities for challenges
It seems to me that most of challenges in Grid are related to management or development of applications which need Grid.
In my view, I do not see any challenging issues which are specific to Grid. Application, networking, Internet protocols are changing orthogonally. Therefore success of Grid depends on success of their components.
How successful will be Grid in future ? Well, keep mum about future.