Sixth Framework STREP 045256                                                                 Deliverable 1.0
             ...
Sixth Framework STREP 045256                                                                Deliverable 1.0
              ...
Sixth Framework STREP 045256                                                                 Deliverable 1.0
             ...
Sixth Framework STREP 045256                                                                 Deliverable 1.0
             ...
Sixth Framework STREP 045256                                                            Deliverable 1.0
                  ...
Sixth Framework STREP 045256                                                            Deliverable 1.0
                  ...
Sixth Framework STREP 045256                                                            Deliverable 1.0
                  ...
Sixth Framework STREP 045256                                                            Deliverable 1.0
                  ...
Sixth Framework STREP 045256                                                            Deliverable 1.0
                  ...
Sixth Framework STREP 045256                                                            Deliverable 1.0
                  ...
Sixth Framework STREP 045256                                                             Deliverable 1.0
                 ...
Sixth Framework STREP 045256                                                                Deliverable 1.0
              ...
Sixth Framework STREP 045256                                                                Deliverable 1.0
              ...
Sixth Framework STREP 045256                                                            Deliverable 1.0
                  ...
Sixth Framework STREP 045256                                                            Deliverable 1.0
                  ...
Sixth Framework STREP 045256                                                            Deliverable 1.0
                  ...
Sixth Framework STREP 045256                                                            Deliverable 1.0
                  ...
Sixth Framework STREP 045256                                                            Deliverable 1.0
                  ...
Sixth Framework STREP 045256                                                            Deliverable 1.0
                  ...
Sixth Framework STREP 045256                                                            Deliverable 1.0
                  ...
Sixth Framework STREP 045256                                                            Deliverable 1.0
                  ...
Sixth Framework STREP 045256                                                            Deliverable 1.0
                  ...
Sixth Framework STREP 045256                                                            Deliverable 1.0
                  ...
Sixth Framework STREP 045256                                                            Deliverable 1.0
                  ...
Sixth Framework STREP 045256                                                            Deliverable 1.0
                  ...
Sixth Framework STREP 045256                                                             Deliverable 1.0
                 ...
Sixth Framework STREP 045256                                                            Deliverable 1.0
                  ...
Sixth Framework STREP 045256                                                                      Deliverable 1.0
        ...
Sixth Framework STREP 045256                                                            Deliverable 1.0
                  ...
Sixth Framework STREP 045256                                                                 Deliverable 1.0
             ...
Sixth Framework STREP 045256                                                                  Deliverable 1.0
            ...
Sixth Framework STREP 045256                                                                             Deliverable 1.0
 ...
Sixth Framework STREP 045256                                                            Deliverable 1.0
                  ...
Sixth Framework STREP 045256                                                            Deliverable 1.0
                  ...
Sixth Framework STREP 045256                                                            Deliverable 1.0
                  ...
Sixth Framework STREP 045256                                                            Deliverable 1.0
                  ...
Sixth Framework STREP 045256                                                            Deliverable 1.0
                  ...
Sixth Framework STREP 045256                                                                         Deliverable 1.0
     ...
Deliverable D1.0 Survey of state of the art
Deliverable D1.0 Survey of state of the art
Deliverable D1.0 Survey of state of the art
Deliverable D1.0 Survey of state of the art
Deliverable D1.0 Survey of state of the art
Deliverable D1.0 Survey of state of the art
Deliverable D1.0 Survey of state of the art
Deliverable D1.0 Survey of state of the art
Deliverable D1.0 Survey of state of the art
Deliverable D1.0 Survey of state of the art
Deliverable D1.0 Survey of state of the art
Deliverable D1.0 Survey of state of the art
Deliverable D1.0 Survey of state of the art
Deliverable D1.0 Survey of state of the art
Deliverable D1.0 Survey of state of the art
Deliverable D1.0 Survey of state of the art
Deliverable D1.0 Survey of state of the art
Deliverable D1.0 Survey of state of the art
Deliverable D1.0 Survey of state of the art
Deliverable D1.0 Survey of state of the art
Deliverable D1.0 Survey of state of the art
Deliverable D1.0 Survey of state of the art
Deliverable D1.0 Survey of state of the art
Deliverable D1.0 Survey of state of the art
Deliverable D1.0 Survey of state of the art
Deliverable D1.0 Survey of state of the art
Deliverable D1.0 Survey of state of the art
Deliverable D1.0 Survey of state of the art
Deliverable D1.0 Survey of state of the art
Deliverable D1.0 Survey of state of the art
Deliverable D1.0 Survey of state of the art
Deliverable D1.0 Survey of state of the art
Deliverable D1.0 Survey of state of the art
Deliverable D1.0 Survey of state of the art
Deliverable D1.0 Survey of state of the art
Deliverable D1.0 Survey of state of the art
Deliverable D1.0 Survey of state of the art
Upcoming SlideShare
Loading in …5
×

Deliverable D1.0 Survey of state of the art

998 views

Published on

Published in: Technology, Business
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
998
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
2
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Deliverable D1.0 Survey of state of the art

  1. 1. Sixth Framework STREP 045256 Deliverable 1.0 Commercial in Confidence Europe-China Grid Internetworking European Sixth Framework STREP FP6-2006-IST-045256 Deliverable D1.0 Survey of state of the art The EC-Gin Consortium Universität Innsbruck, UIBK, Austria University of Zürich, UniZH, Switzerland Institut National de Recherche en Informatique et Automatique, INRIA, France Lancaster University, ULANC, U.K. Justinmind, JIM, Spain EXIS IT ltd, Greece University of Surrey, UniS, U.K. Beijing University of Posts and Telecommunications, BUPT, China Institute of Software, Chinese Academy of Sciences, ISCAS, China China Telecommunication Technology Labs, CTTL, China China Mobile Group Design Institute Co., Ltd, CMDI, China © Copyright 2007 the Members of the EC-GIN Consortium For more information on this document or the EC-GIN Project, please contact: Dr. Michael Welzl Leopold-Franzens-University of Innsbruck Institute of Computer Science Technikerstr. 21a A—6020 Innsbruck Austria Phone: +43 (512) 507-6110 Fax: +43 (512) 507-2758 E-mail: michael.welzl@uibk.ac.at Version 4.0 Page 1 of 75 © Copyright 2007, the Members of the EC-GIN Consortium
  2. 2. Sixth Framework STREP 045256 Deliverable 1.0 Commercial in Confidence Document Control Title: Survey of state of the art Type: Public Editor(s): Werner Heiß, Michael Welzl E-mail(s): werner.heiss@uibk.ac.at, michael.welzl@uibk.ac.at Author(s): Thomas Bocek, Dragana Damjanovic, Yehia El khatib, Linghang Fan, Hasan, David Hausheer, Tao Liu, Cristian Morariu, Marcelo Pasin, Pascale Primet, Peter Racz, Gregor Schaffrath, Burkhard Stiller, Jin Wu, Chen Zhang, Lin Zhang, Shaohua Liu, Xiaolei Ma, Li Li, Chongwu Chen, Kun Wang, Wei Li Doc ID: D1.0-v4.0.doc Delivery Date: 31. 5. 2007 Legal Notices The information in this document is subject to change without notice. The Members of the EC-GIN Consortium make no warranty of any kind with regard to this document, including, but not limited to, the implied warranties of merchantability and fitness for a particular purpose. The Members of the EC-GIN Consortium shall not be held liable for errors contained herein or direct, indirect, special, incidental or consequential damages in connection with the furnishing, performance, or use of this material. Version 4.0 Page 2 of 75 © Copyright 2007, the Members of the EC-GIN Consortium
  3. 3. Sixth Framework STREP 045256 Deliverable 1.0 Commercial in Confidence Table of Contents 1  Executive Summary 6  2  Introduction 7  2.1  Document Outline 8  3  A net-centric survey of Grid applications 9  3.1  Introduction 9  3.2  Survey Overview 9  3.2.1  Aim of the Survey 9  3.2.2  Initial Idea and Draft History 9  3.2.3  Contents of the Questionnaire 10  3.2.4  Target Audience 11  3.2.5  Dissemination & Collection 12  3.3  Results of the Survey 12  3.3.1  Research Field 12  3.3.2  Scale 13  3.3.3  Composition 13  3.3.4  Data Granularity 13  3.3.5  Data Timeliness 14  3.3.6  Encryption 15  3.3.7  Accounting 15  3.3.8  Data Replication 15  3.3.9  Data Path 15  3.3.10  Network Transport Protocol 16  3.3.11  Middleware 16  3.3.12  Special Network Services 16  3.4  Possible Improvements 17  3.5  Conclusion 17  4  Use Cases 19  4.1  OGF Use Cases 19  4.1.1  Path-oriented Use Case 19  4.1.2  Knowledge-based Use Cases 23  4.2  EC-GIN Use Cases 24  4.2.1  Data Access Service 25  4.2.2  Large File Transfer 27  4.2.3  P2P Video-over-IP 29  4.2.4  Botnet Detection 30  4.2.5  Mobile IPTV 31  4.2.6  Mobile Grid 37  5  Requirements 42  5.1  Requirements derived from the Questionnaire 42  5.2  Requirements derived from Use Cases 43  5.2.1  OGF Use Cases 43  5.2.2  EC-GIN Use Cases, part 1: OGF-like requirements 44  5.2.3  EC-GIN Use Cases, part 2: Grid Economics 45  5.2.4  EC-GIN Use Cases, part 3: Mobile Services 49  5.3  Application Awareness Support 51  5.3.1  The Requirements of Application Awareness 52  5.3.2  Main Principles 53  5.3.3  Trade-off Analysis in Existing Transparent Network Enhancements 53  5.4  Peer Awareness Support 58  5.4.1  Resource Management Techniques in P2P File Sharing Tools 59  5.4.2  Grid Resource Management Based on P2P Technology 62  6  Conclusion 65  7  References 67  Version 4.0 Page 3 of 75 © Copyright 2007, the Members of the EC-GIN Consortium
  4. 4. Sixth Framework STREP 045256 Deliverable 1.0 Commercial in Confidence 8  Abbreviations 70  9  Acknowledgments 72  10  Appendices 73  10.1  Questionnaire for the Network Requirements of Grid Applications 73  Version 4.0 Page 4 of 75 © Copyright 2007, the Members of the EC-GIN Consortium
  5. 5. Sixth Framework STREP 045256 Deliverable 1.0 Commercial in Confidence (This page is left blank intentionally.) Version 4.0 Page 5 of 75 © Copyright 2007, the Members of the EC-GIN Consortium
  6. 6. Sixth Framework STREP 045256 Deliverable 1.0 Commercial in Confidence 1 Executive Summary The aim of this document is to provide a consolidated set of requirements that the GINTONIC architecture needs to take into account. These requirements are derived from a look at the state of the art, addressing the application's point of view — identifying the network peculiarities and needs of Grid applications — as well as the user's point of view. The network requirements of Grid applications are identified by looking at them in two ways: firstly, a survey in which Grid application programmers and users were asked about network-related details of their applications is presented and analyzed. Secondly, use cases of relevance to EC-GIN are examined. In addition to network performance, aspects such as Grid Economics and mobility are identified as a result of this investigation. The technological requirements of the GINTONIC architecture fall in the categories of "application awareness", where the right trade-off between transparently supporting applications and explicitly communicating with them must be found, and "peer awareness", where key functionality for locating potential helpers in the network must be included. These two aspects are addressed by means of a brief overview of related work. Version 4.0 Page 6 of 75 © Copyright 2007, the Members of the EC-GIN Consortium
  7. 7. Sixth Framework STREP 045256 Deliverable 1.0 Commercial in Confidence 2 Introduction The intention of EC-GIN Work Package 1 is to design the architecture of GINTONIC. The first step towards this goal, a requirements analysis based upon the state of the art, is taken with this document. The state of the art under consideration must encompass a large variety of Grid applications with their usage scenarios as well as technical work that is related to the foundation of the GINTONIC architecture; it should not be confused with related work on Grid-specific network mechanisms, which is the focus of Deliverable 3.0. Since the goal of the state of the art overview presented in this document is to derive requirements for the architecture, it consists of three distinct elements: 1. Network requirements of Grid applications, consisting of a. requirements which are already known b. requirements which still must be derived 2. Requirements of Grid Economics 3. Technical requirements that the GINTONIC architecture must address in order for the mechanisms that will be embedded in it to carry out their envisioned tasks in an efficient and correct manner For addressing 1 a) above, it was decided to carry out a survey of the network behavior and requirements of Grid applications. This survey was done by developing a questionnaire which was then given to Grid application users and programmers, both as a physical document (at the 20th OGF meeting in Manchester, UK) and via email, using an on-line version of the questionnaire. Use cases are another, extremely useful way to derive application requirements; thus, it was decided to address 1 b) above with a documentation of use cases. A number of relevant use cases have already been identified by the Grid High-Performance Networking Research Group (GHPN-RG) of the Open Grid Forum (OGF); thus, we begin our overview with an abridgment of the related OGF document ([Fer05]). This is complemented with several additional use cases that were identified by the EC-GIN consortium. For element 3) above, two key issues have been identified for the architecture underlying GINTONIC: 1) Application Awareness, which refers to the trade-off between transparent (i.e. unknown to the application) operation of a network enhancement within the stack on the one hand, and explicit communication between the application and the network stack on the other 2) Peer Awareness, which refers to the necessity of knowing about the existence and location of other nodes such that they can aid in the provisioning of network enhancements for the sake of Grid applications Version 4.0 Page 7 of 75 © Copyright 2007, the Members of the EC-GIN Consortium
  8. 8. Sixth Framework STREP 045256 Deliverable 1.0 Commercial in Confidence Regarding these two issues, a large set of frameworks and protocols have been developed in the past which partially meet some of the requirements and key functionality that EC-GIN needs to address. A re-use and extension of such components and mechanisms is essential; hence, a brief overview of such related work is included in the requirements analysis that is presented in this document. 2.1 Document Outline The survey of the network behavior and requirements of Grid applications is documented in chapter 3; the questionnaire that was used is included in the appendix. In chapter 4, use cases of Grid applications are described, highlighting the way these applications use the network. This chapter is split in two halves – an abridgment of the OGF document on use cases, and specific use cases that were identified in the EC-GIN project. Chapter 5 describes EC-GIN requirements on the basis of: • The survey in chapter 3 • The use cases in chapter 4 • Related work where application awareness is an issue • Related work where peer awareness is an issue Chapter 6 concludes with a brief summary of key lessons learned towards the fulfillment of the goals of EC-GIN. Version 4.0 Page 8 of 75 © Copyright 2007, the Members of the EC-GIN Consortium
  9. 9. Sixth Framework STREP 045256 Deliverable 1.0 Commercial in Confidence 3 A net-centric survey of Grid applications 3.1 Introduction Work-package 1 is concerned with the architectural design of the GINTONIC API, a comprehensive implementation of the EC-GIN mechanisms. GINTONIC will provide new programming abstractions that will improve the performance of network communication across the Grid. For this design, understanding the requirements of the applications is crucial. It might be easy enough to predict these requirements according to our perception of Grid applications. However, a close look at some applications that are in operation would yield a more realistic set of requirements. For this reason, we have conducted a survey of Grid applications. We have looked into the scale and characteristics of their Grids, their middleware environments, their traffic footprints, and most importantly their network requirements. The following section provides more information about the survey and the process of conducting it. Results from the survey are revealed in section 3.3. We discuss a few possible improvements that might be useful for future surveys in section 3.4, and conclude our analysis in section 3.5. 3.2 Survey Overview The survey was conducted by circulating a 2-page questionnaire amidst projects employing or developing Grid applications for scientific research. An electronic version of the questionnaire was also made available on the Internet. A set of 30 individual results was collected and analyzed. This section discusses the process of the survey and the contents of the questionnaire. 3.2.1 Aim of the Survey The aim of the survey is to draw a clearer picture of what the requirements are, based on the specifications of deployed Grid applications. The results give a recommendation of the services that need to be included in the API design. The results also describe some aspects of the applications such as scale, middleware, etc. The survey, however, is not intended to give a statistical analysis of the different aspects and characteristics of Grid applications. 3.2.2 Initial Idea and Draft History The idea and backbone of the questionnaire emerged from discussions within the consortium about our expectations with respect to the network requirements of Grid applications. A first draft was put together and presented to the EC-GIN partners. Several other versions of the questionnaire were developed based on the feedback received from our partners. After obtaining a fairly developed version of the questionnaire, an interview was held with a Grid application developer. The importance of this interview was two-fold; it produced the first result sample of the survey and, at the same time, it served to gauge the ease of filling Version 4.0 Page 9 of 75 © Copyright 2007, the Members of the EC-GIN Consortium
  10. 10. Sixth Framework STREP 045256 Deliverable 1.0 Commercial in Confidence out the questionnaire by Grid application developers/administrators. The interview was very important for us to make the questionnaire self-explanatory and inviting to fill out. The final version of the questionnaire was reached by March 16th 2007. It can be found in Section 10.1. 3.2.3 Contents of the Questionnaire Considering that the questionnaire was not created for the purpose of conducting a thorough investigation of Grid application details, it was kept as simple as possible. The questionnaire is made up of two pages plus a front-page, and it consists of a collection of short multiple-choice questions as well as a final open-ended question. However, it is noteworthy that even the multiple-choice questions in the survey leave room for the participants to write further comments, considerations and/or supporting statements. The participants were asked to answer the questions to the best of their knowledge, using approximations for any numerical values. We also asked the participants to provide some background information (such as whether they are developers, administrators, or users of Grid applications). The questionnaire was divided into the following twelve sections: • Description of the Grid application: In this section, the participants are asked to give a brief overview of the purpose of the application. • Scale: This section enquires about the scale of the Grid used, in terms of the number of nodes and administration domains. • Composition: The participants are asked about the device diversity in their Grid. This aims at getting a feel for the type and level of heterogeneity of the computational resources available in the Grid. • Data granularity: It is often said that Grid computing requires the efficient transfer of large bulks of data across networks. However, in order to reach efficient communication, we need to know more about such “bulks of data” and about the traffic in general. This is where the importance of this section arises. Here, the participants are asked to provide approximations of the amount and pattern of the traffic. • Data timeliness: Similar to the previous section, this section intends to find out more about the characteristics of the traffic. In this case, the quest is to learn about any delay-intolerant traffic and the reason behind such delay-sensitivity. • Encryption: This section asks about the amount of the traffic which is encrypted, and about the provision of encryption services. • Accounting: In this section, we ask the participants about the metrics their applications consider for accounting. • Data replication: This section addresses the topic of replica management, which Grids utilize in order to optimize reliability, response latency, etc. We will use the results from this section to deduce whether or not GINTONIC is required (or rather expected) to facilitate such functionality. Version 4.0 Page 10 of 75 © Copyright 2007, the Members of the EC-GIN Consortium
  11. 11. Sixth Framework STREP 045256 Deliverable 1.0 Commercial in Confidence • Data path: Here we enquire about any one-to-many communication services used by the application. Again, this is a service that we might want GINTONIC to provide. • Network transport protocol: The intention of adding this section is to learn about the importance of connection-oriented communication as opposed to connectionless communication. • Middleware: Here we ask about the middleware solutions employed. • Special network services: This section enquires about more network services that might be supported by GINTONIC. Examples given in the questionnaire include Transfer Delay Prediction, Network Topology Information, etc. Participants were asked to add other services that are important to their application. 3.2.4 Target Audience The questionnaire was sent out to a number of projects which employ or are in the process of developing a Grid application. Due to the technical nature of some of the questions, we intentionally targeted people who have adequate experience with Grid applications. This includes the developers, administrators, and advanced users who have used the system enough to know about its behavior and requirements. Project Application Name Austrian Grid WIEN2K Automatika Automatika Bridge Bridge ChinaGrid Course Online CNGrid PSDM (PreStack Depth Migration) Crown CROWN Dzero SAMGrid GridFTP EC-GIN Mobile IPTV PPLive EGEE EGEE WIEN2K Epidaure Bronze Standards EXIS EXIS Bill Printing EuroGrid EuroGrid GeSRM GROWL GRASIL GRASIL GREDIA Mobile.News InGrid InGrid Invmod/Wasim Invmod LHC ATLAS Montage Montage MyTies.to MyTies.to POV-RAY POV-RAY QCDGrid DiGS (QCDGrid) TAG ray2mesh White Rose Grid DAME (Distributed Aircraft Maintenance Environment) -- DMSS (Distributed Media Service System) -- MeteoAG Version 4.0 Page 11 of 75 © Copyright 2007, the Members of the EC-GIN Consortium
  12. 12. Sixth Framework STREP 045256 Deliverable 1.0 Commercial in Confidence -- Parallel FDTD (Finite-Difference Time-Domain Calculations) Table 3-1: A List of the Surveyed Grid Applications and their Projects 3.2.5 Dissemination & Collection Once the final version was ready, the questionnaire was disseminated in Grid-related events (such as OGF20) and circulated via email to relevant mailing lists and to various projects where people work with Grid applications. The use of an interactive Web-based questionnaire made filling the questionnaire a more user-friendly experience and consequently more appealing to the participants. Not surprisingly, we received more responses using the online questionnaire than using hard copy. In a couple of cases, participants were asked for a short interview to get more details or to clarify their responses. Out of tens of projects that were approached, sixteen results were collected by April 15th 2007. Analysis of the results followed and the outcome was presented in the first version of this document submitted on July 4th 2007. Since then, fourteen more results were collected through the Web-based questionnaire. The last result was received on June 25th 2008. We added the new results to the previous ones and reanalyzed the whole set. Table 3-1 lists the names of the surveyed Grid applications and their corresponding projects (if any). 3.3 Results of the Survey In this section, we present the set of results compiled from every section of the questionnaire, as well as a brief commentary. 3.3.1 Research Field 23% of the applications we surveyed are used for Engineering; 10% are used for each of Mathematical Analysis, Meteorology, Particle Physics, and Visualization; 7% for each of Content Distribution, Medicine & Pharmaceuticals, and Social Sciences; and 3% for each of Education, Environmental Sciences and Software Development. Figure 3-1 illustrates this distribution. Version 4.0 Page 12 of 75 © Copyright 2007, the Members of the EC-GIN Consortium
  13. 13. Sixth Framework STREP 045256 Deliverable 1.0 Commercial in Confidence Figure 3-1: Pie-chart illustrating the research fields of the surveyed applications 3.3.2 Scale 14% of the surveyed applications are deployed over Grids made up of 10 nodes or less, 57% over 100-400 nodes, and 29% over more than 1000 nodes. Almost all of these Grids span across 1-20 administrative domains. The single exception was a Grid that has nodes in more than 1000 different domains. 3.3.3 Composition Half of the surveyed applications are deployed solely on dedicated super-nodes and/or clusters. Only a single application is deployed on a Grid network free of dedicated super- nodes, consisting only of desktop computers. In the remaining Grids, the devices are divided as follows: 47% dedicated super-nodes, 44% desktop machines, and 9% mobile devices. It is worth noting that only one application uses small devices (such as embedded processors) and they hardly constitute 1% of the total number of devices in that Grid. Further inspection revealed some interesting relationships between certain application types and the composition of their Grids. All the surveyed image analysis applications, for instance, are deployed on Grids composed solely of super-computers; all surveyed simulation applications are deployed on almost 100% super-computers; and all surveyed data management applications are deployed over Grids where there is far more low performance, loosely-coupled computers than powerful, tightly-coupled super-computers. 3.3.4 Data Granularity Based on the (approximate) values given by the participants, the survey revealed that the three most common dataset sizes are around 10 MB. Figure 3-2 depicts the logistic distribution of dataset sizes, while Figure 3-3 illustrates the corresponding sigmoid curve. A closer look at the numbers shows that almost 10% of the datasets of all applications are in bulks smaller than 100 kB in size, 55% are in bulks of 1-100 MB, and 17% are in bulks of 10 GB or more. Although only to a limited extent, these numbers show how different Version 4.0 Page 13 of 75 © Copyright 2007, the Members of the EC-GIN Consortium
  14. 14. Sixth Framework STREP 045256 Deliverable 1.0 Commercial in Confidence Grid traffic is when compared to generic IP traffic (such as Web traffic). It also illustrates how mixed the dataset sizes are. Figure 3-2: The probability density function of dataset sizes Figure 3-3: The cumulative distribution function of dataset sizes 3.3.5 Data Timeliness Time-critical applications need to enforce deadlines on the delivery of their packets. Packets that arrive later than the deadlines are considered of no use and are discarded. Embarrassingly parallel applications, on the other hand, do not typically impose such deadlines. Version 4.0 Page 14 of 75 © Copyright 2007, the Members of the EC-GIN Consortium
  15. 15. Sixth Framework STREP 045256 Deliverable 1.0 Commercial in Confidence One of the applications we surveyed, for example, is being used for forecasting Alpine watersheds and thunderstorms based on parameter measurements from data collection points deployed in the field. Data that arrives late has to be discarded in order to process the data that is due. Besides this application, five other applications impose similar deadlines on the delivery of non-multimedia-stream data packets. These applications rely on the prompt return of results from service invocations. Even asynchronous services are subject to delivery deadlines. Interestingly enough, the time-sensitive part of the data in these six applications is mainly the part that is transferred as large bulks. Three other applications enforce deadlines on the transfer of multimedia traffic. The rest of the surveyed applications did not report any implemented deadline schemes. 3.3.6 Encryption Although security is a major concern in Grids, only 40% of the applications encrypt their data prior to sending it over the network. Of these applications, 50% rely entirely on the middleware to provide the encryption while 33% rely entirely on the network transport layer to encrypt the data. The remaining 17% used both the middleware and the transport layer to carry out the encryption. It was also observed that all surveyed Particle Physics applications utilized the middleware to encrypt 100% of their data, while all surveyed Social Sciences applications utilized the network transport layer to encrypt 100% of their data. 3.3.7 Accounting When asked about the metrics considered important for accounting measures, 73% of the participants chose CPU processing power, 50% chose available network bandwidth and 43% marked disk storage space. Some participants mentioned more specific accounting factors such as service invocations, number of employed CPUs, code generation number, and software licenses. 3.3.8 Data Replication 30% of all surveyed applications replicate at least a portion of the data they push into the network. On average, these applications replicate 73% of their traffic. It was noticed that all surveyed Particle Physics applications replicate all their data. 3.3.9 Data Path Overall, 66% of all traffic has more than one recipient. This portion of the traffic is created by only 27% of the surveyed applications, while the remaining 73% of the surveyed applications never use one-to-many communication schemes. All the applications that do send traffic to more than one destination employ multicast one way or another. Two thirds of these applications integrate a multicasting mechanism into the application, while the other third employs the middleware multicasting services. Besides multicast, the anycast scheme [Par93] is utilized by 33% of the applications. Scavenging (a more advanced anycast scheme where the recipients of the data are Version 4.0 Page 15 of 75 © Copyright 2007, the Members of the EC-GIN Consortium
  16. 16. Sixth Framework STREP 045256 Deliverable 1.0 Commercial in Confidence chosen according to specific criteria set forth by the application and verified by the resource brokering element of the middleware) is also used by 33% of the applications. 3.3.10 Network Transport Protocol All surveyed applications utilize TCP as the main network transport protocol. This shows how important reliable communication is to Grid applications. 13% of the surveyed applications utilize UDP for transferring multimedia content. One of the applications also uses UDT (UDP-based Data Transfer Protocol), which is an application-level transport protocol designed for high-speed WANs [Gu07], for testing purposes only. 3.3.11 Middleware 57% of the surveyed applications run on top of a Globus-based Grid computing environment. Other applications use EGEE-LCG/gLite (20%), Askalon (10%), Condor (3%), DIET (3%), jBoss (3%), OAR & SCP (3%), GRIA (3%), or Unicore (3%). 17% of participants use proprietary middleware solutions specifically created for their respective projects. One of the unanticipated facts that the survey has revealed is that 37% of all applications run on top of more than one middleware solution. 30% of the applications use two middleware solutions, all of them having Globus as one of the two middleware solutions; 10% of the applications employ Globus along with Askalon, another 10% use Globus with EGEE-LCG/gLite, while 7% use Globus along with a proprietary middleware solution. Furthermore, some 7% of the applications use three middleware solutions, all of which included EGEE-LCG/gLite as one of the middleware solutions. 3.3.12 Special Network Services Although 32% of the participants were not aware of any transfer delay prediction services, most of them were aware and indeed 41% agreed that it would be very useful for their Grid applications. Only 14% were already using such services, while another 14% thought it was unnecessary for their application. As for advance network reservation, only 18% of the applications employ it to ensure the quality of communication before it commences, whereas 23% think that it would be useful for their application but they do not currently employ it. Also, 18% ruled out the use of advanced reservation completely while 41% were unsure about the vitality of such services. The provisioning of network topology information appears to be a new concept to almost 52% of the survey participants. There are some 24% of the participants who do not think that network topology information would be of any use to their respective applications. However, 19% of them said that their Grid application would most definitely make use of such service if available. These participants mentioned that knowledge obtained from such a service would help make data transfer operations more efficient. One participant said that the application his institution developed involves various operations where the knowledge of the closest node to any given node is very useful. Another participant said Version 4.0 Page 16 of 75 © Copyright 2007, the Members of the EC-GIN Consortium
  17. 17. Sixth Framework STREP 045256 Deliverable 1.0 Commercial in Confidence that knowledge of the network topology would help their scheduler in distributing work in a more efficient manner. Two of the survey participants mentioned that their Grid applications require extra resource brokering on top of that provided by the middleware. One of these mentioned that they use SRB [Raj02] on top of Globus to manage shared memory across Grid nodes. The other said that the application they developed needs more information about CPU usage and availability across the Grid. This application runs over a proprietary middleware solution. One thing all participants seemed to agree on is that the API should maintain the ease of secure communication across multiple administration domains. The necessity of secure communication is an obvious requirement of our API or perhaps any similar integrated connectivity interface. In Grids, it is necessary to maintain interoperability at different levels across all domains and thus achieving security is made more difficult [Fos98]. Yet, according to the survey participants, achieving secure communication using any of the current middleware solutions is a cumbersome task. Lastly, most participants agreed that special network-related operating system services and special networking hardware are not required to run their applications. 3.4 Possible Improvements In retrospect, there are a small number of things that might have improved the process of conducting such a survey, but unfortunately were not thought of early enough. The introduction of such changes while the survey was already being carried out would have disrupted the process and caused it to require more time than at hand. However, we find it important to mention these possible improvements for future reference and for the benefit of any similar survey. A major improvement would have been to alter the goal of the survey to cover more Grid applications. Certainly, this would have required more time to collect the results as well as more time to analyze them. Nevertheless, a larger result set would grant the survey the potential to supply more concrete conclusions. The aim of our survey was to provide some direction for the GINTONIC API design phase. Therefore, although the survey was conducted over a short period of time (30 days) yielding a relatively small result set, the output from our analysis is still interesting and valuable in light of our purpose. 3.5 Conclusion We have conducted a survey in which we focused on the application features and characteristics that are relevant to the design of the GINTONIC API, including network functionality, network demands, and middleware interaction. Besides offering some general information about a few of the Grid applications currently in use for scientific research, the survey highlights the main expectations of the users from an API such as GINTONIC. At the very least, the API is expected to integrate a few services in its architecture. The survey results hint at the services that are considered important by Grid applications, and those that seem to be less important. The survey has also triggered Version 4.0 Page 17 of 75 © Copyright 2007, the Members of the EC-GIN Consortium
  18. 18. Sixth Framework STREP 045256 Deliverable 1.0 Commercial in Confidence more questions about integrated services. For instance, does transfer delay prediction make advanced reservation of less use? Finally, the survey has fabled the belief that “the majority of Grid application traffic is made of enormous volumes of data”. Quite the opposite, the results demonstrate that Grid traffic comes in all shapes and sizes. Version 4.0 Page 18 of 75 © Copyright 2007, the Members of the EC-GIN Consortium
  19. 19. Sixth Framework STREP 045256 Deliverable 1.0 Commercial in Confidence 4 Use Cases Use cases are a valuable means for identifying requirements of a certain system. In this section, a number of use cases are described as a basis for designing GINTONIC, the architecture of which will be laid out in Deliverable 1.1. Long before EC-GIN started, the Grid High-Performance Networking Working Group (GHPN) of the Open Grid Forum (OGF) developed a document ([Fer05]) in which some use cases for network services were given. In an attempt to avoid reinventing the wheel, we begin this section with an abridgment of this document. Then, after a higher level view on the general requirements for GINTONIC, several use cases that were identified by the EC-GIN consortium are presented. 4.1 OGF Use Cases Network services are specialized in the handling of network-related or network-resident resources. A network service is further labeled as a Grid network service whenever it has roles and/or interfaces that are deemed to be specific to a Grid infrastructure. The OGF document [Fer05] provides a high-level, structured description of some well- understood Grid network services use cases. The purpose is to facilitate the identification of network services critical to the Grid middleware and user applications and to identify relationships between different Grid network services. Use cases are divided in two groups: path-oriented and knowledge-based. The former group includes use cases with various specific connectivity service level requirements, while the latter includes information-oriented use cases related to network monitoring and usage scenarios of performance data. 4.1.1 Path-oriented Use Case This section illustrates a number of use cases aiming at the usage of different types of network connectivity. In what follows we detail the individual scenarios. 4.1.1.1 Visualization Session Visualization is one of the key methods used to represent raw or processed data and especially remote visualization, tele-immersion, collaborative visualization, tele-operation, and distributed simulation analysis are requiring a significant amount of network resources. The users are applications requiring visualization analysis of very large data sets from remote locations. Depending on the configuration (if computation and rendering is performed on local or remote site), a significant amount of network resources can be used in the following case: 1. Streaming data to a site used to perform the processing and rendering: the network is used for streaming data from a storage device or from one or more data access devices (like a sensor, microscope etc.), or from computations performed on a remote site to a site where rendering is performed. Version 4.0 Page 19 of 75 © Copyright 2007, the Members of the EC-GIN Consortium
  20. 20. Sixth Framework STREP 045256 Deliverable 1.0 Commercial in Confidence 2. Streaming processed (rendered) data to a display: in case rendering is performed on a remote site, the generated data is streamed to a display site. 3. Interactive commands: Interactive commands from the end users may need new data from the sensors, data from several remote servers, or new computations to be performed before displaying the modified results, in some cases on a near-real- time basis. All these services are latency and jitter sensitive, especially the last one involving interactive commands. It has been shown that in tightly coupled networked manipulation tasks involving distantly located collaborating partners, 200ms roundtrip latency is the maximum acceptable latency before the users resort to half-duplex interaction (i.e. both users cease to work simultaneously – instead they take turns to manipulate the environment, and wait to see what happens). However it has also been shown that network jitter has far greater impact because jitter makes it difficult for users to predict how their system will react. 4.1.1.2 Large Data Streaming coordinated with Job Execution The coordinated use of multiple resources is particularly challenging in Grid infrastructures, due to the distributed nature of the resources involved in complex workflows with internal dependencies. A Grid network service guaranteeing the timely access to remote resources allows the synchronization of the individual components of a complex workflow, with a consequent gain in terms of resource usage efficiency. In the specific case of data access, high-throughput file transfer with deadline allows for the synchronization of job execution with the transfer of input data. For example, input data can be pre-staged while in the meantime the corresponding job is waiting to be executed. The coordination of data streaming and job execution can be effectively used by any Grid application that is oriented to the processing of large volumes of data. Community schedulers need to control multiple distributed computational resources in order to serve individual workflows. By modeling of the data transport as an individual service with a predictable termination time, the scheduler can potentially create a service level agreement for the entire workflow, assuring a specific end-time even in case of input data not yet available locally. 4.1.1.3 High Energy Physics File Replica Management Use Cases The High Energy Physics experiments at CERN will record data sets of several petabytes per year. The analysis of the recorded data requires the transmission of raw data to remote sites for processing and the exchange of the processing results. Data centers are divided into four Tiers: Tier-0 is the experiment itself; Tier-0 and Tier-1 sites each have all the raw data; Tier-2 and 3 sites have subsets of raw and processed data. Tier-1 sites are distributed so that large geographical areas (on the scale of a country) are encompassed. Data is typically streamed from the detectors and resides on disk for a while before being translated to tape. While the files are on disk, Tier-1 sites can pull over any files they are interested in. Tier-1 processed data is then exchanged with other Tier-1 sites. There are three distinct file transfer use cases described in [Fer05]: Version 4.0 Page 20 of 75 © Copyright 2007, the Members of the EC-GIN Consortium
  21. 21. Sixth Framework STREP 045256 Deliverable 1.0 Commercial in Confidence 1. the retrieval of raw data from a Tier-0 to a Tier-1 site; 2. data reprocessing; 3. the file transfer needed for remote job execution. Here, we focus on the first and third use case only. In the first use case there is the time limit for migrating the data from Tier-0 to Tier-1 sites. While the data is being recorded and constantly pushed to tape, there is only a certain time window within which the data is initially available for migration to the Tier-1 sites, so the data transfer services have to make the most efficient use of this window, maximizing the available bandwidth. After this phase each of the Tier-1 sites has a copy of the raw data set for one experiment. These data have already been processed once (as soon as they were recorded). Subsequently, the experiment management decides that enough further detector calibration has been performed for a complete data re-processing cycle to be performed. This requires that the complete raw data set is passed through and reprocessed again. For this phase the raw data need to be replicated at several sites and a process (not described here) selects three sites which will each run 1/3 of the processing. The sites are widely geographically separated (for example, one in the Asia-Pacific area, one in the EU and one in the US). It is required that the re-processing is completed within 2 weeks of commencement (assuming that enough CPU power has been identified for this), and it is also required that the re-processed data are distributed to all Tier-1 sites throughout the World and that this should be completed within one week of finishing the re-processing. The third use case involves submitting a job to the Resource Broker to be executed. The job specification contains all the restrictions and requirements on the CE (Compute Element) that must run it as well as the list of requested files and physical file name of any data produced by it. For choosing the CE the Resource Broker calculates the costs for transferring all necessary files to CE, for each CE and according to this calculation and other requirements the Resource Broker is choosing a certain CE. For this calculation, the Resource Broker can make use of information about the network and prediction of file transfers over network. 4.1.1.4 Emergency Application Technician Application with Integrated Wireless Sensors iRevive is an Emergency Medical Technician (EMT) application, with integrated real-time medical vital sign wireless sensors, allowing electronic Patient Care Records (PCRs) from the point of first patient contact. Each iRevive patient has an attached real-time medical sensor that records vital sign data and also serves as a dynamic patient tag, allowing EMTs to record procedures performed on each patient. iRevive integrates the sensor data of each patient into the PCR for efficient and error free data entry, creating a more consistent and complete PCR. This sensor integration enables triaging of mass casualty events quickly. The primary users for this system are EMT teams in the field with patients. Secondary users are researchers that access data from aggregated PCRs in later data mining applications. Version 4.0 Page 21 of 75 © Copyright 2007, the Members of the EC-GIN Consortium
  22. 22. Sixth Framework STREP 045256 Deliverable 1.0 Commercial in Confidence Real-time vital sign data might be critical to proper care, so QoS is required for latency, packet loss rate and bandwidth. Different sensors and different applications will require different QoS profiles. Furthermore, QoS network requirements will be dynamic as conditions change. The monitoring of the pulse and blood oxygen levels in a patient is an example. When normal, these vital signs might have a slow sampling rate, but as sensors detect that the levels cross a normal threshold, network bandwidth and latency requirements increase because more frequent sampling is required as the data is more critical. 4.1.1.5 Distributed Aircraft Maintenance Environment (DAME) The Distributed Aircraft Maintenance Environment (DAME) provides a Grid-based, collaborative and interactive workbench of remote services and tools for use by human experts. It currently supports remote analysis of vibration and performance data by various geographically dispersed users: local engineers and remote experts. The diagnosis environment is built around a workflow system, and an extensive set of data analysis tools and services, which can provide automated diagnosis for known conditions. Where automated diagnosis is not possible DAME provides remote experts with a collaborative and interactive diagnosis and analysis environment. An overview of the typical diagnostic scenario including escalation to the remote experts (Maintenance Analyst and possibly Domain Expert) is described below. 1. An aircraft lands, and data from the on-wing system (QUICK) is automatically downloaded to the associated local Ground Support System (GSS). 2. QUICK and its GSS indicate whether any abnormality (this is a detected condition for which there is a known cause) or novelty (this is a detected deviation from normality for which there is currently no known cause) has been detected. 3. DAME executes an automatic workflow to determine its diagnosis. This is a standard pre-programmed diagnostic sequence. 4. Depending on the result of the QUICK and DAME automatic diagnoses there are three outcomes: • Everything is normal – the engine is ready for the next flight. • A condition, which has a known cause, has been detected. This can be resolved by immediate maintenance action or planned for future maintenance action, as appropriate. • A condition, which currently does not have a clear cause, has been detected or there is some ambiguity about the cause. This case is referred to the remote experts (a Maintenance Analyst and possible a Domain Expert) for consideration. Considering network resources, this use case makes use of fast relocation of a large amount of data. Each aircraft flight can produce up to 1 Gigabyte of data per engine, which, when scaled to the fleet level, represents a collection rate of the order of terabytes of data per year. The storage of this data also requires vast data repositories that may be distributed across many geographic and operational boundaries. 4.1.1.6 Networked Supercomputing Many enterprises use High Performance Computing (HPC) clusters to run commercial HPC applications in order to increase the enterprise’s profitability and competitiveness. These applications offer significant advantages, especially considering the amount of time Version 4.0 Page 22 of 75 © Copyright 2007, the Members of the EC-GIN Consortium
  23. 23. Sixth Framework STREP 045256 Deliverable 1.0 Commercial in Confidence saved to generate results, as this may reduce the risk of development, or enable investments to be better aligned and spent, or reduce products to market time. While the use of these parallel HPC applications usually starts in a single data center with local networked supercomputing, a clear trend can be seen today towards operating distributed data centers, which then requires the appropriate WAN technologies to support networked supercomputing in a distributed way. Within HPC, each node within the cluster needs to be able to communicate with different resources (storage, for example) and to other nodes for control and inter-process communications. Generically, communications within a cluster can be broken down into four operations: 1. Access network – The access network provides user access to the cluster to allow job scheduling and viewing of graphical data. It may also provide connectivity to remote resources such as Network Attached Storage (NAS) or other clusters within the context of a Grid. 2. Management network – The management network is the clusters command and control network that enables the master node to schedule, start, checkpoint, and stop work that is executed on the cluster. It also allows the nodes to be monitored for troubleshooting purposes. 3. Storage or I/O network – In most HPC environments the cluster nodes download data from an external NAS or Storage Area Network (SAN) into their local disk and then perform the necessary calculations before writing the result back to the NAS or SAN. This requires high-speed access between the NAS/SAN systems and the cluster nodes. 4. IPC (Inter Process Communication) network – The IPC network provides high speed connectivity between cluster nodes such that IPC messages can be exchanged. Because the IPC network characteristics have the most effect on application performance, the IPC network uses high bandwidth and low latency network technologies. 4.1.2 Knowledge-based Use Cases This section includes use cases that are about the collection and usage of network performance information. 4.1.2.1 Passively Monitored Data Passively monitored data can be used by administrative personnel in two typical scenarios: for characterization of file transfers (statistic information about source and destination of the transfers as well as the size of the transferred files) and for early fault detection (an administrator runs software which will warn her or him if users start to experience poor performance). For this use case data is collected from CEs. Information that might be passively monitored, such as application data throughput, can be created on a very large scale (for example, every single file transfer within the Grid might produce a tuple). Conversely, the measurement of such information through active techniques, requiring the active injection of test traffic, is more likely to perturb the behavior of the system under monitoring, and is consequently more invasive and less scalable. Version 4.0 Page 23 of 75 © Copyright 2007, the Members of the EC-GIN Consortium
  24. 24. Sixth Framework STREP 045256 Deliverable 1.0 Commercial in Confidence 4.1.2.2 Administrative Setup of Schedules of Measurements Network performance data are gathered through measurement sessions, which can be triggered on a regular basis or on-demand. Administrators require regularly scheduled and ad-hoc measurements for a variety of reasons. On-demand measurement schedules can be of various types: a single ad-hoc measurements, temporary schedules and permanent schedules. Clients that may be interested in triggering on-demand measurement sessions are both administrators and middleware agents: • administrators interested in the network state monitoring and availability of data for network problem diagnosis; • administrators wishing to manually set up measurements to aid middleware in optimizing the functions of the Grid; • administrators monitoring the performance of a Grid site to ensure that network resources are well provisioned and SLAs are being kept to; • middleware services requiring measurements in response to changes in system configuration or usage patterns. 4.1.2.3 Service Optimization Network performance information can be used to optimize the behavior of both user applications and middleware in Grids. In fact, network performance metrics can be composed to generate a projected view of the status of a given network path. This type of information can then be used to drive the networked behavior of software agents to minimize the overall cost of transmission involved in a complex workflow. Cost models can vary depending on the application and/or middleware requirements, and depend on a set of network performance metrics. Network costs can be used to select the preferable destination nodes (clients, servers etc) from a set of candidates. 4.2 EC-GIN Use Cases Previous research efforts of Data Grid focus mainly on the integrated and collaborative use of resources at the end systems to meet the requirement of Data Grid applications. More precisely, previous research efforts are more or less focused on solving problems such as data representation, data location, replication, consistency, authorization and authentication. These efforts are based on an assumption where all resources at end systems are connected by high-performance networks. However, the continuous growth towards the size of a worldwide Grid brings new problems. Consider the situation where cross-continental-users are connected by Wide Area Networks – even though the endsystems collaborate well and resources are ideally allocated to meet the user’s requirements, the access of bulk data in remote places might easily cause a performance bottleneck due to network performance degradations. To make the Grid applicable under varied network conditions, network performance should be taken into account. The underlying communication infrastructure of Grids, moreover, is a complex interconnection of LANs and WANs that introduces potential bottlenecks and varying performance characteristics [Flo95]. In EC-GIN, we propose a new architecture that bridges the gap between the performance of the network and the Grid to provide some Version 4.0 Page 24 of 75 © Copyright 2007, the Members of the EC-GIN Consortium
  25. 25. Sixth Framework STREP 045256 Deliverable 1.0 Commercial in Confidence sort of QoS guarantee for accessing data. Not only resources in end systems, but also network resources are integrated in order to deliver a data access service. 4.2.1 Data Access Service Figure 4-1 depicts the basic architecture of a Data Access Service for mainstream Grid platforms. A web-based data access service is exposed to the user. Users submit transfers, specify quality of service requirements, monitor status, or terminate transfers through message exchanges to the service. By providing a stateless interface, users organize and monitor data transfer through message passing to and from the data access service. Figure 4-1: Conventional Data Access Service architecture in the Grid The advantages of the architecture encompass the following aspects: • The control flow is not passed to the user and held on the user end. A data transfer will be continued even when the user is disconnected. • Detailed network information might not be passed to the user end. End users are allowed to interact with data transfers, e.g. for performance monitoring, only by using pre-agreed services. • A Web Service can be used by several users simultaneously, which makes it possible in coordinating data transfers to reach optimum transmission resource consumption. • A data access service can handle more network information and control strategies then what a single end system can do. It provides the possibility to more efficiently orchestrate the data movement (e.g. parallel transfers are possible). Moreover, data can be obtained from data replicas in different locations. In this architecture, point-to-point data movement is a fundamental operation between two storage resources, and provides the foundation for replication, caching, and bulk data access. Replication acts as a high-level service that is built using basic data movement Version 4.0 Page 25 of 75 © Copyright 2007, the Members of the EC-GIN Consortium
  26. 26. Sixth Framework STREP 045256 Deliverable 1.0 Commercial in Confidence function. It creates replicas to reduce access latency and network bandwidth consumption, maintain local control over transient and necessary data, and improve reliability and load balancing. However, widely used data access services in a Grid have little consideration in the communication network which supports the data movement. They emphasize more on the resource sharing at end-systems and simply assume that the communication network is transparent when an end-to-end connection is set up. This assumption sounds reasonable when the data access service is built on a high-performance network that connects resources geographically close enough. However, for a cross-continental Grid network covering a radius of several thousand miles, the complexities and uncertainties occurred within the communication network is an inevitable factor for the performance of a data access service. Therefore, as shown below, a new data access service architecture with network assistance is proposed. Figure 4-2: Data Access Service Architecture with End-to-end Performance Management In this architecture, the communication network is also considered as a resource and coordinated by the data access service. Therefore, three logical procedures can be identified for accessing a large file. First, data resources have to be identified. This is normally the effort to find out the location of the required data. Second, transmission resources have to be identified. Roughly, this is to find out the possible route from data source to destination. Finally, the best transmission plan in terms of performance is given and resource reservation is conducted accordingly. Orchestrating Data Access in a Grid Environment The detailed structure of data access with network assistance is shown in Figure 4-3. Version 4.0 Page 26 of 75 © Copyright 2007, the Members of the EC-GIN Consortium
  27. 27. Sixth Framework STREP 045256 Deliverable 1.0 Commercial in Confidence Figure 4-3: Orchestrating Data Access The design of a data management service is modular with several independent services interacting via the data management service. A Metadata Catalog service maintains associations between logical files and their representative characteristics. By providing representative characteristics, the service replies with its logical file identification. Replica location services serve as registries which define a mapping between a logical name and the service that can provide access to the data object. With the replication location services, replicas are not constrained as “bit-wise” copies. The Network Weather Service (NWS) is an external service and collects network operation information. Data resource location and network performance information are fed into the scheduler, which consequently determines the best data movement plan based on performance and consistency considerations. The transmission plan is passed to a broker, which makes sure that related resources are usable during the whole transmission process. The broker is a key part to coordinate the resources which can either belong to the end systems or to the network. Data transfer service is the underlying transport service and provides basic mechanisms for accessing and managing the data located in storage systems. These mechanisms provide abstractions for uniformly creating, deleting, accessing and modifying file instances across storage systems. Network performance service maintains abstraction and manipulation of network resources, provides predictions for file transfer. 4.2.2 Large File Transfer Figure 4-4 illustrates the scenario of a large file transfer from one Grid node (A) to another (B). For the ease of understanding the underlying network topology is a very simple one and it is shown in Figure 4-4. The links from each node to the closest router are assumed to have enough bandwidth to accommodate any requirement, while the links between the Version 4.0 Page 27 of 75 © Copyright 2007, the Members of the EC-GIN Consortium
  28. 28. Sixth Framework STREP 045256 Deliverable 1.0 Commercial in Confidence routers have limited bandwidth. The key questions addressed by this scenario are the following: • When does it make sense to send a large file via multiple paths of an overlay network in order to increase the file transfer throughput? • How could this functionality be exposed as a transport service? • What constraints can be considered in order to allow the delivery of a certain level of QoS? • How could a large file transfer be authenticated and authorized by the intermediate nodes? As the two depicted cases in Figure 4-4 show, the answer depends very much on the underlying network. If the overlay links do not share the same network bottleneck as in case a), sending a large file via multiple paths, i.e. path 1 (red, via node C) and path 2 (blue, direct) may indeed increase the overall throughput. However, if the two paths share the same bottleneck link as in case b), a multipath transfer does not improve the throughput at all but will only increase the transport overhead. Therefore, it is essential, that the overlay application is aware of the underlying network topology. Clearly, the described problem may also be solved on a lower layer, e.g., with MPLS on layer 2. However, since the Grid nodes shown in Figure 4-4 may be distributed all over the Internet, addressing the problem on layer 2 or 3 may not be feasible due to the heterogeneity of different ISP domains. Having a multi-path data transfer solution built in higher layers allows intermediate processing on the transferred data. As an example assume that node A wants to send to an application on B a file representing a database of medical imagery but the application running on B requires that each image is transformed into a different format. Using a multi-path transfer mechanism as the one proposed here, intermediate nodes may perform some of the processing required (however, this is an application-dependent behavior). a) multipath file transfer is beneficial b) multipath file transfer is not beneficial Figure 4-4: Large File Transfer Scenario with Network Bottleneck Version 4.0 Page 28 of 75 © Copyright 2007, the Members of the EC-GIN Consortium
  29. 29. Sixth Framework STREP 045256 Deliverable 1.0 Commercial in Confidence 4.2.2.1 Automatika As an example of the Large File Transfer scenario we describe Automatika, an application developed by JIM. Automatika is a system that automates software development in distributed systems. Automatika uses 3 basic elements: EVC clients (Easy Visual Creator Client - a stand alone java application), Factory servers and Showcase servers. EVC is a Rapid Prototyping & Simulation environment to define applications without programming. A Factory Server takes the prototype as an input and delivers a full functional web application (typically larger than 100 MB), which is then deployed in a test environment (the Showcase server). There are two different networks in Automatika: a network that links EVC clients with Factory servers and another one that links Factory Servers and Showcase Servers. GINTONIC will be applied in the Factory-Showcase network. At the moment a Factory chooses a Showcase based on how many showcases are available and on the number of projects loaded in a showcase. GINTONIC can equip the Factory server with valuable information about the network and hence make the application deployment considerably more efficient. 4.2.3 P2P Video-over-IP In this scenario, as depicted in Figure 4-5, a set of peers communicates with each other via video calls. Each peer might participate or host a video conference, relay call streams or share their PSTN connection with the rest and act as a Gateway. The goal is to distribute the task efficiently over the network, considering both node resources, like memory or processing power and network bandwidth or topology. Thus, this scenario extends the Large File Transfer Scenario by: • adding further parameters to be taken into account: node resources and potentially policy decisions become important as well as NAT, delay or jitter constraints for partial flows. • shifting the focus from the optimization of a single flow to the optimization of multiple flows, potentially migrating conferences dynamically from one node to another during the call. Another side aspect that might be developed in this context might be cooperative detection of selfish nodes and unsolicited calls. Version 4.0 Page 29 of 75 © Copyright 2007, the Members of the EC-GIN Consortium
  30. 30. Sixth Framework STREP 045256 Deliverable 1.0 Commercial in Confidence a) Setup of 1 conference b) Setup of 2 conferences Figure 4-5: Peer-to-Peer Video-over-IP Scenario 4.2.4 Botnet Detection In this scenario, as depicted in Figure 4-6, Grid nodes work together as a cooperative Distributed Intrusion Detection System (DIDS), where information gathering and analysis tasks are distributed over the Grid. The key questions to be answered are: • How can the work be efficiently distributed over the set of instances, both balancing the load, considering location constraints (e.g. due to sensor location) and not introducing too much delay due to transfers? • Which nodes should be contacted for which information under which circumstances? In addition to load balancing and transmission speed aspects, this scenario will have important implications on policy and trust constraints. If detached from the specific use of Botnet detection, the communication mechanisms may be used for detection of intrusion within the Grid as well as identification and accounting on misbehavior of nodes. Version 4.0 Page 30 of 75 © Copyright 2007, the Members of the EC-GIN Consortium
  31. 31. Sixth Framework STREP 045256 Deliverable 1.0 Commercial in Confidence a) Anomaly indication b) Setup of DIDS Figure 4-6: Botnet Detection Scenario 4.2.5 Mobile IPTV IPTV is highly advocated in recent years. The general definition of IPTV includes four parts for business role models as shown in Figure 4-7. These parts are [ITU06]: 1) Contents: Video, audio (including voice), data, text and applications - Business role model: contents provider. 2) Service Middleware Platform: Contents receiving, manipulating, value-added processing, and transmitting with security and management according to the service provider. - Business role model: service provider. 3) Network Carrier: Managed, controlled, and secured delivery of contents processed by a platform using QoS controlled Broadband IP network including wire and wireless. - Business role model: network provider. 4) Terminal: TV, PDA, Cellular, Mobile TV with an STB module (Subscriber Terminal Block) or similar device for the customer. - Business role model: Customer Version 4.0 Page 31 of 75 © Copyright 2007, the Members of the EC-GIN Consortium
  32. 32. Sixth Framework STREP 045256 Deliverable 1.0 Commercial in Confidence Management of Headend/Service Network Provider Mobile IPTV Service Content/Service/Delivery Management Platform Content Provider/ Integration/ developer Content Service Middleware Network Carrier Application Management/Delivery Platform › Contract Management › Operation System › Multi-transmission › Linear/Broadcast TV/EPG › DRM Management Management › IP Enabled › VoD/NVoD/PPV › Market Report › Customer Management › Multi-Access › Consumer Originated Video › Content Management › Security › Web Service/Email › Content Delivery › Advertising › Policy Management Video Server › Third Party Application › Stream Transmission/ Capture/Periodic Download › Distributed Architecture › Multi-coding Supporting Figure 4-7: Overall definition and description of IPTV in the business role model For this reason, from the customer point of view, IPTV is not only a quality guaranteed convergence service of telecommunication and broadcasting because the customer can use the telecommunication services (such as Internet access service, e-mail, SMS, VoIP and VoD), and broadcasting services (such as terrestrial (cable, satellite) channel and narrowcasting contents) simultaneously but also a ubiquitous service because the users can get multimedia contents at any time, at any place for their tastes. Due to the great demand of the IPTV service and ever-increasing number of people joining the mobile consumer field, we focus our scenario on Mobile IPTV, which allows users to watch TV on their mobile terminals. In detail, the services of Mobile IPTV are listed below: Entertainment • Linear/Broadcast TV; • Video/TV on Demand (VoD); • Interactive TV (iTV); • Push VoD; • Consumer Originated Video; • Music (Audio); • Games; • Picture; General Information • Advertising; • Sports news; • Entertainment News; • Emergency Information; Version 4.0 Page 32 of 75 © Copyright 2007, the Members of the EC-GIN Consortium
  33. 33. Sixth Framework STREP 045256 Deliverable 1.0 Commercial in Confidence • General News; • Travel Information; • Stock Exchange Information; Educational • Computer/STB based training; • Distance Learning; Communication/Messaging • Interactive/Communications Applications; • Email; • Video Telephone; Service Information • Interactive Program Guide (IPG/EPG); • Parental Services; • Notification Services; In a traditional IPTV architecture, central servers are used to distribute video contents to each end user, resulting in high workload in dedicated servers and costly upload bandwidth. As the population of subscribing users is getting larger, the so-called client/server (C/S) system can easily be overwhelmed with limited service capacity. In a mobile environment, this problem will be more obvious, since the unpredictable disconnection is considered as a part of normal wireless communication. Some P2P based IPTV systems over the Internet have been proposed to overcome the problem mentioned above, such as PPLive [Hei06], PPStream [PPS07] as well as Gridmedia [Luo06]. P2P networks emerge as a scalable and efficient means to provide group communication with the turn of this century. The core concept in P2P is that each peer, also interchangeably called node or user, plays the role of both client and server at the same time. With each participant contributing individual computation, storage and bandwidth resources into collective pool, the overall system performance is hence amplified by hundred or even thousand times. Consider Gridmedia for example, its main structure is shown in Figure 4-8. Figure 4-8: The structure of Gridmedia system Version 4.0 Page 33 of 75 © Copyright 2007, the Members of the EC-GIN Consortium
  34. 34. Sixth Framework STREP 045256 Deliverable 1.0 Commercial in Confidence From the figure, we can see that the main elements of the Gridmedia system include the rendezvous point (RP) server, the streaming server, and peers. Only the streaming server and RP server are dedicated servers, and all the peers are the computers of end users. The functionality of a streaming server in Gridmedia is almost the same as with traditional C/S servers. When it is connected to a peer, it will send the live content to the peer. The RP server is used to facilitate the login process of new arriving peers. In our mobile IPTV scenario, we import mobile Grid technology to handle the problems appearing in traditional C/S mode. Mobile Grid computing [Mil05] is about making Grid Services available and accessible anytime anywhere from mobile devices. The main advantages of mobile Grid computing include mobile-to-mobile and mobile-to-desktop collaboration for resource sharing, improving user experience, convenience and contextual relevance and novel application scenarios. A Grid-based mobile environment would allow mobile devices to become more efficient by off-loading resource-demanding work to more powerful devices or computers. IPTV based on mobile Grid technology might have the architecture shown in Figure 4-9. In this figure, mobile terminals are distributed in various administration domains, or we can regard them as the terminals in various areas (Beijing, Shanghai and Guangzhou, for example). Each area has a domain streaming server which can send the live content to the terminals, and every domain streaming server must login to the Grid Service which also lies in the wired Grid, and will get the streaming list of all the domain streaming servers. In this way, if a mobile terminal wants content out of its domain streaming server’s scope, the server can check the list and find the nearest server holding it. Then it can contact that server to request the content; in this way, the problems appearing in the traditional C/S mode can be effectively controlled. As shown in Figure 4-9, the domain servers receive the contents from the wired Grid, in which many powerful machines can offer the content provisioning for guaranteeing that there are rich programs available to satisfy the consumer's requirements. Figure 4-9: Mobile IPTV scenario Moreover, because mobile devices are usually portable but resource-limited devices with various sorts of networking capability, a proxy-based solution is used in the mobile Grid Version 4.0 Page 34 of 75 © Copyright 2007, the Members of the EC-GIN Consortium
  35. 35. Sixth Framework STREP 045256 Deliverable 1.0 Commercial in Confidence ([Fer05], [Gua05]). Proxy devices might be a desktop computer or a small server available for nearby mobile devices via the local wireless LAN, which is also connected to the domain streaming server via a high-speed network. This is not shown in Figure 3-1. In the mobile IPTV scenario, when the mobile device moves from one proxy’s scope to another, how to do the hand-off transparently and seamlessly is still a problem that we are concerned about. An example of the VOD use case is shown in Figure 4-10. VOD (Video On Demand) is an important service provided by IPTV, as it makes the users obtain a personalized service. Figure 4-10: VOD use case Our achievement in EC-GIN can be used in this mobile IPTV scenario in the following ways: First, the outcome of the resource management efforts in EC-GIN will provide an effective resource discovery and management mechanism for mobile IPTV, which will make the domain streaming server quickly find the missing content and send it to the mobile terminal. In this way, the whole mobile IPTV service can be provided in a more effective manner. Second, the outcome of the Grid Economics effort in EC-GIN may improve the traffic problem through economic theory, and make the delivery of the IPTV service and accounting information more fluent. Moreover, other research efforts in the Grid Economics field, such as pricing, can support an effective charging mechanism that the service provider and the server may be concerned about. Third, because the wireless network is weaker than the wired network, security seems to be more important in mobile environment. The outcome of the security and trust mechanism in EC-GIN may be useful for mobile IPTV, where every mobile terminal should Version 4.0 Page 35 of 75 © Copyright 2007, the Members of the EC-GIN Consortium
  36. 36. Sixth Framework STREP 045256 Deliverable 1.0 Commercial in Confidence register and hold the certificate to join and leave the system. A security and trust mechanism can protect the mobile IPTV network from malicious attacks and ensure that the service and accounting information can safely be delivered. Furthermore, successful implementation of Mobile IPTV includes two aspects: • efficient transmission of multimedia making multicast technology indispensable • abundant content and service of mobile IPTV requiring an efficient CDN (Content Delivery Network) As for multimedia transmission, the key technology is multicast, which solves the problem of single point send/multi-point receive and multi-point send/multi-point receive. It implements highly efficient point-to-multi-point data transmission, saves network bandwidth, and reduce network load, as illustrated in Figure 4-11. MBMS (Multicast Broadcast/Multicast Service) is an IP datacast (IPDC) type of service that can be offered via existing GSM and UMTS cellular networks. The infrastructure gives the possibility to use an uplink channel for the interactions between the service and the user. This is not a straightforward issue in usual broadcast networks, as for example conventional digital television is only a one-way (unicast) system. …Over MBMS … One data channel per user One data channel per TV channel Figure 4-11: Unicast vs. multicast MBMS has been standardized in various groups of 3GPP, and the first phase standards are to be finalized for UMTS release 6. The service seems to be rather attractive, as quite a lot of operators, equipments manufactures and other representatives have participated in the standardization work. It can consequently be assumed that there will be several services offered via MBMS in the near future. MBMS provides a new method for transferring data to a number of users simultaneously. As a general rule of the evolution path of GSM and UMTS networks and terminals, backward compatibility issues apply also to MBMS. This means that MBMS will not Version 4.0 Page 36 of 75 © Copyright 2007, the Members of the EC-GIN Consortium
  37. 37. Sixth Framework STREP 045256 Deliverable 1.0 Commercial in Confidence interfere with already existing GSM and UMTS services, and mobile terminals not supporting MBMS will work in networks that offer MBMS for customers with MBMS- capable terminals. Common evaluation parameters of Mobile IPTV include Throughput, Jitter, Join Latency, Leave Latency, Channel Overlap/Channel Gap, Channel Switch Delay, and Channel Change/Zap Delay. 4.2.6 Mobile Grid The telecommunication operator (represented by China Mobile) in China experiences a tremendous expansion of its user number. The type of service provided by the operator has increased from traditional telecommunication to data services, such as games, music and locating. All of these efforts are designated to ensure an increase in Average Revenue Per User (ARPU). The diversification of services increases the complexity of the value chain. The abundance of resources related to the new services makes the service resources transfer from centralization to distribution, which is the basis of the convergence between the Grid and telecommunication networks. On these grounds, the "Mobile IPTV" use case is here extended to the general scenario of the mobile Grid. Realizing service integration between different operators is the position of Grid technology in the telecom field and also the aim of Grid technology. The main issue of Grid technology is to share resources between different organizations so as to make the ordering and modularization of services that are cooperatively offered by different organizations better. Of course, this idea does not exclude the application of Grid technology in User Equipment (UE). However, it should be noted that, especially in a mobile network, it is less likely that the UE will realize Grid functions because this includes too many business oriented aspects (such as billing, for example), and in the whole process the UE is the receiver instead of being a resource provider. Therefore, in the current period the key point of the convergence between the Grid technology and the telecom network is not resource providing by the UE, but the integration of service resources across different telecom fields, i.e. the service providing ability. In this section, the term "Mobile Grid" therefore refers to the Mobile Operator’s Grid, which mainly focuses on service integration and provisioning in the service network. 4.2.6.1 Service Model The notion of a service that is considered for the Mobile Grid in EC-GIN combines the definitions of the TeleManagement Forum (TMF), Open Mobile Alliance (OMA) and the Open Grid Forum (OGF) to: “A service is ordered by a customer; it is an ability provided by the operator or the service provider according to the requirement specified in an SLA or a contract. This ability is a set of functions which form an integral part of one or more business processes.” As shown in Figure 4-12, this service model can be divided into four levels: service resource level, service component level, VO level and service level. Service resource can be divided into Grid resources and operator resources according to their ascription. Grid resources are the resources belonging to a third party except for the operator, which may Version 4.0 Page 37 of 75 © Copyright 2007, the Members of the EC-GIN Consortium
  38. 38. Sixth Framework STREP 045256 Deliverable 1.0 Commercial in Confidence include the program, computer, data and software. The operator resources include not only the computing, data storage and software in the operator network but also network resource, such as the fixed and wireless access networks and network entities, such as billing center, network management system etc. OMA service enabler or OGSA/WSRF technology integrates these scattered service resources into a set of single functions which is called a service component. These function sets can be operator specific sets or new sets. The service resources used by each function are not limited by the operator but can be freely organized, and there may be intersections between two function sets. These functions include data synchronization, file distributed storage and directory management. Each virtual organization integrates these function sets in a service framework and presents the service to the user according to the SLA & Contract. Here, two points should be noted: one is that the user orders only one service from one service provider. The other is that in the virtual organization, the service combination relation between inner entities in the VO is limited according to the SLA & Contract, which is also a business process. Service …… SLA&Contract VO Operator leading Other WS Gateways , OSA Gateways , UDDI, Protal , GridEnhance Framework Service Component Operator inherent Function Set New Function Set OMA Service Enabler/OGSA /WSRF/Other Service Resource Operator natural Resources Other Network Resources Figure 4-12: Service Model for Convergence 4.2.6.2 Network Reference Model Figure 4-13 shows a possible network model based on the current network architecture of the China Mobile Group Co. Ltd. (CMCC). The network service providing and management abilities are to be improved by adding logical elements within the operator’s packet-switched core network domain. These logical elements include service gateways, service management platforms etc. The Grid Function Network Element (GFNE) in the service resource network performs the integration of service resources and the provisioning of function sets. As mentioned above, these function sets are divided into operator resources and other network resources. The GFNEs in the two domains cooperate to perform the integration of service resources and the provisioning of services. Version 4.0 Page 38 of 75 © Copyright 2007, the Members of the EC-GIN Consortium

×