TOP of text, page 2 and following pages,aligned to line


Published on

  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

TOP of text, page 2 and following pages,aligned to line

  1. 1. JClarens: A Java Framework for Developing and Deploying Web Services for Grid Computing Michael Thomas,1 Conrad Steenberg,1 Frank van Lingen,1 Harvey Newman,1 Julian Bunn,1 Arshad Ali,2 Richard McClatchey,3 Ashiq Anjum,2,3 Tahir Azim,2 Waqas ur Rehman,2 Faisal Khan,2 Jang Uk In4 1 California Institute of Technology, Pasadena, CA, USA {thomas,conrad,newman}, {fvlingen, Julian.Bunn} 2 National University of Sciences and Technology, Rawalpindi, Pakistan {arshad.ali,ashiq.anjum,tahir.azim,waqas.rehman,faisal.khan} 3 University of the West of England,Bristol ,UK 4 University of Florida, Gainsville, FL, USA Abstract CMS and ATLAS will be generating petabytes to High Energy Physics (HEP) and other scientific exabytes of data that must be accessible to over 2000 communities have adopted Service Oriented physicists from 150 participating institutions in more Architectures (SOA) as part of a larger Grid than 30 countries. computing effort. This effort involves the integration Grid computing holds the promise of harnessing of many legacy applications and programming computing resources at geographically dispersed libraries into a SOA framework. The Grid Analysis institutions into a larger distributed system that can Environment (GAE) is such a service oriented be utilized by the entire collaboration. architecture based on the Clarens Grid Services As one of any site’s responsibilities is to the local Framework and is being developed as part of the users, institutes managing these sites generally have a Compact Muon Solenoid (CMS) experiment at the large amount of control over their site’s resources. Large Hadron Collider (LHC) at European Local requirements at each site lead to different Laboratory for Particle Physics (CERN) . Clarens decisions on operating systems, software toolkits, and provides a set of authorization, access control, and usage policies. These differing requirements can discovery services, as well as XMLRPC and SOAP result in a very heterogeneous character of some Grid access to all deployed services. Two environments. implementations of the Clarens Web Services A Services Oriented Architecture (SOA) is well Framework (Python and Java) offer integration suited to addressing some of the issues that arise from possibilities for a wide range of programming such a heterogeneous, locally controlled but globally languages. This paper describes the Java shared system. Three features of a SOA make it a implementation of the Clarens Web Services suitable candidate for use in a Grid computing Framework called ‘JClarens.’ and several web environment: services of interest to the scientific and Grid Standard Interface Definitions – The use of community that have been deployed using JClarens. common interface definitions for resources allows them to be used in the same way. 1. Introduction Implementation Independence – Any programming language on any operating system can These days scientific collaborations require more be used to implement services. Local sites are not computing (cpu, storage, networking etc) resources required to run a particular operating system or to use than can be provided by any single institution. a specific programming language when implementing Furthermore, these collaborations consist of many or selecting their service implementations. geographically dispersed groups of researchers. As Standard Foundation Services – These services an example, two High Energy Physics experiments provide the basic security, discovery, and access
  2. 2. control features for locating and accessing remote HTTP-based authentication can also be performed services. using the ‘system’ service described below. One example of a SOA is the Grid Analysis Environment (GAE) which uses the Clarens Grid Service Framework. The GAE aims to provide a coherent environment for thousands of physicists to utilize the geographically dispersed computing resources for data analysis. The GAE SOA is part of the Ultralight project which focuses on integration of networks as an end-to-end, managed resource in global, data-intensive Grid systems. Clarens is also utilized as a Grid Service Framework in other projects, such as Lambda station [11], Proof Enabled Analysis Center (PEAC) , and Physh . In this paper, we discuss the design, implementation and various services of the second Clarens implementation called ‘JClarens.’ The JClarens architecture is based on Java Servlet Technology and XMLRPC/SOAP. JClarens can be deployed with any web server configured with a servlet engine. Both JClarens and PClarens (based on Python) are intended to be deployed as a set of peer- Figure 1. JClarens architecture to-peer servers with a complementary set of deployed services. The JClarens web application uses a single servlet to handle both SOAP and XMLRPC messages, as 2. Software architecture shown in Figure 1. The servlet contains an instance of both an Axis SOAP engine and an Apache XML- The core of the Clarens Framework provides a set RPC engine. Each incoming request is examined to of standard foundation services for authorization, determine if it is a SOAP or XMLRPC request. access control, service discovery and a framework for Requests are then passed to the appropriate engine hosting additional grid services. These additional (Axis or XML-RPC) for processing. The use of a services offer functionality, such as remote job single URL for both types of requests simplifies the submission, job scheduling, data discovery and configuration for client applications. access. Standard technologies such as Public Key The use of two transport encoding engines Infrastructure (PKI) for security, and necessitates two configurations. A Java properties SOAP/XMLRPC for invoking remote services, are file is used to configure the main JClarens web used within the Clarens framework to provide secure application, as well as the XMLRPC engine. The and ubiquitous access. The two implementations of interface for a web service is specified using one Clarens (Python and Java) both share the common set configuration property, and another property is used of standard foundation services. PClarens, is based to define the implementation class for that service. on the Apache Web Server with the mod_python This makes it very easy to install multiple service module. In this implementation services are written implementations, and choose between the multiple using the Python programming language. implementations at startup time. A standard SOAP JClarens is developed as a single J2EE web web service deployment descriptor file is used to application hosted inside the Tomcat servlet configure the SOAP engine. container . Tomcat provides the basic HTTP transport for sending and receiving the SOAP and XMLRPC messages. The HTTPS transport is also 3. Services supported. The built-in HTTPS connector for Tomcat could not be used, however, due to its lack of In a dynamic Grid environment, there is no central support for client authentication using proxy authority that can be trusted for all authorization and certificates. Libraries provided by the gLite project access control queries. Each local provider of a Grid provide support for this client proxy authentication. service is responsible for providing these features for their set of local services. The Clarens framework
  3. 3. provides several core services , which includes a lightweight, portable scheduling agent that provides system service that provides authorization and access the server with scheduling requests. The client also control management capabilities. A group service interacts with a user which submits scheduling augments the access control capabilities by allowing requests to the client. As such, it provides an abstract groups of users to be managed as a single entity. The layer to the scheduling service, while exposing a file service provides a limited mechanism for customized interface to accommodate user specific browsing, uploading, and downloading files on a functions. remote system. A proxy service assists in the SPHINX processes Directed Acyclic Graphs management of client proxy credentials that are used (DAGs) using a system based on a finite state when executing shell commands (see Shell Service, machine. Each DAG workflow (and corresponding Part 3.4, infra) and when making delegated service set of jobs) can be in one of several states. These calls from one service to another. states allow for an efficient control flow as well as The core services described in the previous graceful recovery in the case of machine failure. paragraph are part of the standard installation of Each state has a module to handle control flow and Clarens. As Clarens is continuously being developed machine failure. The data management component and used within projects, new functionalities and communicates with the monitoring interface and a services are being created. Rather than creating a Replica Location Service (RLS) for managing copies large package containing all the possible services, of data stored at many sites. site administrators can decide to install additional services Additional services add useful functionality for GSI-enabled Sphinx Client Chimera Virtual Data operating in a Grid environment as well as some XML-RPC System domain-specific services for use in very particular Request Tracker environments such as CMS. The next sections discuss Processing Condor-G/DAGMan several of these additional services. Data VDT Client Warehouse 3.1. Job scheduling Data RLS server Management Globus Resource The job of a Grid scheduler is to enable effective use of resources at many Grid sites. The scheduler Information takes into account data location, CPU availability, Gathering Cluster scheduler/deamon and resource policies to perform a matchmaking Sphinx Server VDT Server (Grid Site) process between the user’s request and the Grid Sphinx Architecture resources. SPHINX is a novel scheduling middleware in a dynamically changing and Figure 2. SPHINX architecture heterogeneous Grid environment and provides several key functionalities in its architecture for efficient and Figure 2 shows the architecture of SPHINX. fault tolerant scheduling in dynamic Grid The SPHINX client forwards the job generated by environments. Chimera [19] to the server for execution site Modular system: The scheduling system can be recommendation. The SPHINX server schedules the easily modified or extended. Changes to an job on to a site utilizing the monitoring information individual scheduling module can be made without and Replica Location Service. The client can then affecting the logical structure of other modules. submit it to that site and the tracker can send back job Robust and recoverable system: The SPHINX status information to the server. The communication server uses a relational database to manage the between all the components uses GSI-enabled XML- scheduling process. The database also provides a RPC services. level of fault tolerance by making the system easily recoverable from internal component failures. 3.2. Discovery Service Platform independent interoperable system: The JClarens service framework is used by SPHINX to Within a global distributed service environment provide platform and language neutral XML-based services will appear, disappear, and be moved in an communication protocols. unpredictable and dynamic manner. It is virtually SPHINX consists of two components: a client and impossible for scientists and applications to keep a server. This separation facilitates system track of these changes. The discovery service accessibility and portability. The client is a provides scientists and applications the ability to
  4. 4. query for services and to retrieve up-to-date The discovery service periodically purges expired information on the location and interface of a service entries from this in-memory cache. Since the service in a dynamic environment. Although the discovery registry is stored in memory, it is not persistent across service is conceptually similar to a UDDI registry, it server restarts, which is not a problem since the offers a much simpler service interface and more registry will be quickly populated with new dynamic content. Registration with the discovery information once it starts up again. service must happen at regular intervals in order to The second implementation of the discovery prove that a service is still available. If a service fails service does not make use of the MonALISA Jini to notify the discovery service within a certain time network at all. Instead, it uses a UDDI repository to period, it is automatically removed from the registry. store the service publications. The UDDI The discovery service offers four methods: (1) implementation serves as an important bridge register is used to add a new service to the between the simple JClarens service registry and a registry; (2) find_server is used to locate service more full-featured and common UDDI registry. The hosts that match certain search criteria; (3) find is register method of this implementation makes the used to locate service instances that match certain appropriate UDDI service calls to insert the Web search criteria; (4) deregister is used to remove Service into a known UDDI registry. The find and services from the registry (however it is seldom used find_server methods perform queries on the since the registry will automatically remove the UDDI registry and reformat the result to match the service once it fails to re-register). discovery service interface. This UDDI implementation allows JClarens to be used in an environment that is more static and has existing Clarens CS CS UDDI repositories already in use. Servers CS CS 3.3. Data Location Service SS SS SS Station The Grid will have a number of computational Servers elements that might belong to multiple organizations MonALISA JINI split across geographically dispersed sites. The sites Network are connected by a number of different Wide Area Network (WAN) links. These links will have DS DS Clarens Discovery different bandwidths and latencies for various reasons Servers/JINI Clients such as the relative locations of the sites, the Clients capabilities of the local telecommunications CL CL CL providers, etc. Some executables and data items will be large compared to the available network Figure 3. Discovery Service bandwidths and latencies. The relative locations of executables and data, network bandwidth consumed, Two implementations of the discovery service size and time of the data transfer are important within interface are currently in use. The first is a peer-to- Grid wide scheduling decisions as these parameters peer implementation based on Jini and the might have significant impact on the computing MonALISA Grid monitoring system, as shown in costs. Figure 3. MonALISA is a Jini-based monitoring In order to achieve some of the objectives stated system that uses station servers to collect local above, File Access and optimized Replica Location monitoring data, and uses a Jini peer-to-peer network Services in a Grid analysis environment must be to share selected monitoring data with other station combined so that a user or user agent can specify a servers or other interested clients. Arbitrary single logical filename and return the optimal monitoring data can be published to a MonALISA physical path to that file. The Data Location Service station server using ApMon, a library that uses simple (DLS) outlined in this paper focuses on the selection XDR encoded UDP packets. of the best replica of a selected logical file, taking This first implementation of the discovery service into account the location of the computing resources uses the ApMon library to publish service and network and storage access latencies. It must be a registrations to the MonALISA Jini network. Each light-weight web service that gathers information discovery service contains a client that listens for from the Grid’s network monitoring service and these service publications on the MonALISA Jini performs access optimization calculations based on network, and stores them in an in-memory cache. this information.
  5. 5. This service provides optimal replica information determined by some network ping time or other on the basis of both faster access and better network measurements from MonALISA. The DLS performance characteristics. The Data Location also evaluates the network costs for accessing a process allows an application to choose a replica, replica. For this it must use information such as from among those in various replica catalogs, based estimates of file transfer times based on network on its performance and data access features. Once a monitoring statistics. The DLS selects the "best" logical file is requested by the user, the DLS uses the physical file based on the given parameters. Several replica catalog to locate all replica locations performance parameters such as network speed, containing physical file instances of this logical file, current network throughput, load on data servers, from which it should choose an optimal instance for input queues on data server are included in the metric retrieval. These decisions are based on criteria such for selecting the "best" replica. The discovery service as network speed, data location, file size, data makes it possible to discover and publish services in a transfer time and other related parameters. It should fault tolerant and decentralized way. be decentralized (not rely on some central storage system) and fault tolerant, so that when one instance 3.4. Shell Service goes offline, the user (or client service) is still able to work by using other instances of the service. The shell service provides a generic way to invoke Discovery local shell commands using web service calls in a MonALISA secure manner. A Globus grid-mapfile is used to Service map a user’s x509 credentials to a local user account. The shell service makes use of the suexec command line tool from Apache in order to change to JClarens the local user’s uid before executing the command. Host A Dataset Data suexec has built-in safeguards to ensure that it can Catalog Replica Location not be used to run commands as root or any other Service Optimization Service privileged system user. Service The core method in the shell service is the cmd method. This is the method that is used to execute the shell commands on the JClarens server. It assigns MonALISA a unique ID to each request and returns this ID back to client after the requested command has been scheduled on the server. This command allows the client to launch long-running commands without JClarens having to maintain a persistent connection to the Host B server. The shell service also provides the Dataset Data Catalog Location cmd_info method for obtaining information on the Replica status of the scheduled commands. This information Service Optimization Service Service includes the name of the local system user that was used to run the command, the process ID, start and end times of the command’s execution, and directory containing the sandbox used as the working directory Figure 4. DLS Service Architecture for the command. This directory stores standard Figure 4 shows the DLS architecture. The DLS is output and error files that contain any output and a decentralized service which takes into account the errors from the command. The previously mentioned selection process on the basis of client and file file service can be used to retrieve contents of these location and respective network parameters while files. utilizing the discovery service to locate the available The ability to execute arbitrary commands as a dataset catalog services. Each catalog is queried by local system user makes the shell service a valuable the DLS to find all locations where the requested file tool for integrating existing command line tools into is available. The service returns a paginated list of web services. By providing a set of utility functions file locations to the caller. In addition, the DLS in the JClarens core server, new web services can be monitors data access patterns, keeping track of often- written that simply wrap command line tools and used and often-unavailable files. return the output to the user. For example, a df The result of a call to this service is sorted by service can be written using the shell service utilities either the reliability of the file, or by the "closeness"
  6. 6. that executes the system df command and returns the within the CMS experiment. Future work will focus disk usage statistics back to the user. on developing services to create a self-organizing, autonomous Grid. 4. Adding new services One such service currently under development is the job monitoring service that will be tightly One of the primary motivations for creating coupled with a Grid scheduler. This service will JClarens is to assist web service developers with the provide the user with a view of the current status of development and deployment of new services while their job submission request. A set of estimation providing functionality needed by all services service methods will determine how long a particular deployed in a dynamic Grid environment: Grid job submission (be it a large data transfer authorization, authentication, and discoverability. request or some long-running computation) will take. JClarens uses the common Redhat Package Manager A steering service will provide means for the user to (RPM) packaging format for installation. This RPM fine-tune a job submission, so that he can redirect package comes with a Tomcat 5.0.28 servlet engine slow-running jobs to faster computing sites. Future as well as the JClarens web application itself. Once improvements on the steering service will add more the RPM package has been installed, no further autonomous behavior, making use of the job configuration is necessary to start the server. Some monitoring service and estimation methods to site-specific configuration (such as the use of a host- automatically detect when job execution could be specific x509 certificate) will be necessary before optimized (reducing execution time or resource putting the server into production. usage) and steering such jobs automatically on behalf of the user. # rpm –ivh jclarens-0.5.2-2.i386.rpm Additionally, improvements will be made to further simplify the process of writing and deploying Example 1. Command to install JClarens new services. A tool that will automatically generate XMLRPC binding classes from the service WSDL A sample service is provided as a template for description is being investigated. This would reduce building new services. This sample service contains the amount of code that a service author needs to an Ant-based build file with targets to generate SOAP write so that she doesn’t need to even know that the stubs from WSDL. The build file also contains rules SOAP or XMLRPC bindings exist. for building an RPM for the service. This RPM can Most services require some sort of database then be installed into an existing JClarens connectivity. JClarens supports many databases installation. After installing the service RPM, only a through the use of JDBC, but does so in a somewhat single line need be added to the global JClarens inelegant manner. Currently each supported database configuration file to indicate that this service should requires a new implementation of the service now be used. The sample service also contains a interface due to non-portable table-creation template XMLRPC binding file that can be easily statements. A database abstraction layer would help customized to enable XMLRPC encoding for the remove some of these database-specific features, service. minimizing the need for multiple service implementations. # rpm –ivh jclarens-shell-0.5.2-2.i386.rpm Example 2. Command to install a new service. 6. Related work The process to deploy a new service in JClarens is slightly more involved than with a basic Apache Axis There are many other international Grid projects SOAP engine. However, JClarens provides more underway in other scientific communities. These can features than the basic Axis engine, These features be categorized as integrated Grid systems, core and include the dynamic service publication and user-level middleware, and application-driven efforts. XMLRPC bindings. Some of these are customized for the special requirements of the HEP community. Others do not 5. Future Developments accommodate the data intensive nature of the HEP Grids and focus upon the computational aspect of The JClarens Web Services Framework is Grid computing. becoming a foundation for web service development EGEE middleware, called gLite , is a service- and deployment in several projects such as GAE that oriented architecture. The gLite Grid services aim to focus on distributed and scalable scientific analysis facilitate interoperability among Grid services and
  7. 7. frameworks like JClarens and allow compliance with been taken over by Web Services Resource standards, such as OGSA , which are also based on Framework (WSRF) . WSRF is inspired by the work the SOA principles. of the Global Grid Forum's Open Grid Services Globus provides a software infrastructure that Infrastructure (OGSI) . The developers of the Clarens enables applications to handle distributed Web Services Framework are closely following these heterogeneous computing resources as a single virtual developments. machine. Globus provides basic services and capabilities that are required to construct a 7. Conclusion computational Grid. Globus is constructed as a layered architecture upon which the higher-level JClarens has proven to be an important component JClarens Grid services can be built. of the Grid Analysis Environment. As a second Legion is an object-based “meta-system” that implementation of the Clarens Grid Services provides the software infrastructure so that a system Framework, it has provided a way to integrate of heterogeneous, geographically distributed, high- existing Java Grid services with little difficulty. The performance machines can interact seamlessly. tight integration with the MonALISA monitoring Several of the aims and goals of both projects are system has given rise to a new set of Grid services similar but compared to JClarens the set of methods that can use a global view of the state of the Grid in of an object in Legion are described using Interface order to make optimized decisions. Definition Language. JClarens was chosen as a Grid service host by the The Gridbus toolkit project is engaged in the SPHINX development team based on JClarens’ Java design and development of cluster and Grid implementation, its MonALISA integration, and its middleware technologies for service-oriented publication of services via the dynamic discovery computing. It uses Globus libraries and is aimed for service service registry. Already more projects data intensive sciences and these features make within the HEP community are looking to JClarens to Gridbus conceptually equivalent to JClarens. host their Grid services. NASA’s IPG , is a network of high performance computers, data storage devices, scientific instruments, and advanced user interfaces. Due to its 8. Acknowledgements Data centric nature and OGSA compliance, IPG services can potentially interoperate with GAE This work is partly supported by the Department services. of Energy grants: DE-FC02-01ER254559, DE- WebFlow framework for the wide-area FG03-92-ER40701, DE-AC02-76CH03000 as part of distributed computing. is based on a mesh of Java- the Particle Physics DataGrid project and by the enhanced Apache web servers, running servlets that National Science Foundation grants: ANI-0230967, manage and coordinate distributed computation and it PHY-0303841, PHY-0218937, PHY-0122557. Any is architecturally closer to JClarens . opinions, findings, conclusions or recommendations NetSolve system is based around loosely coupled, expressed in this material are those of the authors and distributed systems, connected via a LAN or WAN. do not necessarily reflect the views of the Department Netsolve clients can be written in multiple languages of Energy or the National Science Foundation. as in JClarens and server can use any scientific package to provide its computational software. 9. References The Gateway system offers a programming paradigm implemented over a virtual web of [1] accessible resources .Although it provides a portal [2] behaviour like JClarens and is based on SOA, its intranet/index.htm design is not intended to support data intensive [3] F. van Lingen, J. Bunn, I. Legrand, H. Newman, C. applications. Steenberg, M. Thomas, P. Avery, D. Bourilkov, R. The GridLab will produce a set of Grid services Cavanaugh, L. Chitnis, M. Kulkarni, J. Uk In, A. and toolkits providing capabilities such as dynamic Anjum, T. Azim, Grid Enabled Analysis: resource brokering, monitoring, data management, Architecture, Prototype and Status. CHEP 2004 security, information, adaptive services and more. Interlaken GAE Services can access and interoperate with [4] C. Steenberg, E. Aslakson, J. Bunn, H. Newman, M. GridLab services due to its SOA based nature. Thomas, F. van Lingen, The Clarens Web Service The Open Grid Services Architecture (OGSA) Architecture CHEP 2003 La Jolla California framework, the Globus-IBM vision for the convergence of web services and Grid computing has
  8. 8. [5] C. Steenberg, J. Bunn, I. Legrand, H. Newman, M. [24]A.Anjum, A. Ali, H. Newman, I. Willers, J. Bunn, M. Thomas, F. van Lingen, A. Anjum, T. Azim, The A. Zafar, R. McClatchey, R. Cavanaugh, F. van Clarens Grid-enabled Web Services Framework: Lingen, C. Steenberg, M. Thomas, Job Interactivity Services and Implementation CHEP 2004 Interlaken using a Steering Service in an Interactive Grid [6] Analysis Environment. Poster #271 at CHEP 2004, Interlaken, Switzerland [7] outCERN/CERNFuture/WhatLHC/WhatLHC-en.html [25] [8] [26]Foster I, Kesselman C. Globus: A metacomputing infrastructure toolki,. International Journal of [9] Supercomputer Applications 1997; 11(2):115–128. [10] [27]Grimshaw A., Wulf W., The Legion vision of a [11] worldwide virtual compute,. Communications of the [12]M. Ballintijn, Global Distributed Parallel Analysis ACM 1997; 40(1). using PROOF and AliEn, in Proceedings of CHEP [28]Buyya R. The Gridbus Toolkit: Enabling Grid 2004. computing and business, [13] [29]Johnston W., Gannon D., Nitzberg B., Grids as [14] production computing environments: The engineering aspects of NASA’s information power grid, Eighth [15] IEEE International Symposium on High Performance [16]A. Ali, A. Anjum, T. Azim, M. Thomas, C. Distributed Computing, Redondo Beach, CA, August Steenberg, H. Newman, J. Bunn, R. Haider, W. 1999. IEEE Computer Society Press: Los Alamitos, Rehman, JClarens: A Java Based Interactive Physics CA, 1999. Analysis Environment for Data Intensive Applications, [30]Akarsu E., Fox G., Furmanski W., Haupt T., ICWS 2004 pp. 716-723 WebFlow—high-level programming environment and [17]Jang-uk In, Paul Avery, Richard Cavanaugh, Laukik visual authoring toolkit for high performance Chitnis, Mandar Kulkarni, Sanjay Ranka, SPHINX: A distributed computing. SC98: High Performance fault-tolerant system for scheduling in dynamic grid Networking and Computing, Orlando, FL,1998. environments, in the proceesings of the 19th IEEE [31]Casanova H., Dongarra J., NetSolve: A network International Parallel & Distributed Processing server for solving computational science problems. Symposium (IPDPS 2005), Denver, Colorado, April, International Journal of Supercomputing Applications 2005. and High Performance Computing 1997; 11(3). [18]Jang-uk In, Adam Arbree, Paul Avery, Richard [32]Akarsu E., Fox G., Haupt T., Kalinichenko A., Kim Cavanaugh, Sanjay Ranka, SPHINX: A Scheduling K., Sheethaalnath P., Youn C., Using Gateway system Middleware for Data Intensive Applications on a to provide a desktop access to high performance Grid, in the proceedings of Computing in High computational resources. The 8th IEEE International Energy Physics (CHEP 2004), Interlaken, Symposium on High Performance Distributed Switzerland, September, 2004 Computing (HPDC-8), Redondo Beach, CA, August [19]I. Foster, J. Vockler, M. Wilde, Y. Zhao, Chimera: A 1999. Virtual Data System for Representing, Querying, and [33]Gridlab. Automating Data Derivation. 14th International Conference on Scientific and Statistical Database [34]Globus. Management, SSDBM 2002 [35]Global Grid Forum. [20]I. Legrand, MonALISA - MONitoring Agents using a Large Integrated Service Architecure International Workshop on Advanced Computing and Analysis Techniques in Physics Research, Tsukuba, Japan. [21] [22]A.Anjum, A. Ali, H. Newman, I. Willers, J. Bunn, W. Rehman, R. McClatchey, R. Cavanaugh, F. van Lingen, C. Steenberg, M. Thomas, Job Monitoring in an Interactive Grid Analysis Environment. Poster #272 at CHEP 2004. [23]A.Anjum, A. Mehmood, A. Ali, H. Newman, I. Willers, J. Bunn, R. McClatchey, Predicting Resource Requirements of a Job Submission. Poster #273 at CHEP 2004.