Yehia El-khatib and Chris Edwards. "A Survey-based Study of Grid Traffic". In Proceedings of the International Conference on Networks for Grid Applications (GridNets 2007), Lyon, France, October 17-19 2007.
Diversity of Grid Traffic: A Survey-based StudyYehia El khatib, Christopher Edwards Computing Department Lancaster University
Outline Introduction Survey Goals Survey Process Survey Results Traffic Behaviour Future Work Conclusion
Introduction EC-GIN (Europe-China Grid InterNetworking) is a Framework 6 STREP project. EC-GIN aims at introduction a networking interface that provides programming abstractions to improve the performance of grid applications. The design of the interface requires an understanding of the network characteristics of grid applications.
Survey Goals The survey is to highlight some of the characteristics of current grid applications Scale and composition of the grid Dataset granularity Data delivery requirements (time restrictions, encryptions, one-to-many services) Others: transport layer protocol, middleware, etc. Special network services
Survey Process Questionnaire Structure 2 pages, also an online version 11 MCQs + 1 open-ended question. Level of Detail As simple as possible. Target Audience Developers, administrators, and advanced users. Dissemination Research projects that are employing or developing a grid application.
Survey Results [outline]1. Research Field2. Scale3. Composition4. Dataset Granularity5. Special Network Services
Survey Results [1/5] Research Field Software Visualization Particle Development 6% Physics 6% 18% Meteorology 6% Medicine 6% Astronomy Environmental 13% Sciences 6% Engineering 13% Social Sciences Mathematical 13% Analysis 13%
Survey Results [2/5] Scale 55 75 70 50% o f t h e su r v e y e d a p p lica t io n s 65 % o f s u r v e y e d a p p lic a t io n s 45 60 40 55 50 35 45 30 40 25 35 30 20 25 15 20 15 10 10 5 5 0 0 < = 10 10-100 100-400 400-1000 > 1000 3 – 10 10 – 100 100 – 1000 > = 1000 Num ber of nodes Number of domains
Survey Results [3/5] Composition Overall Grid Com posit ion Clusters Desk top Machines Em bedded Dev ices Mobile Dev ices
Survey Results [3/5] Composition Overall Grid Com posit ion 47% are deployed only on clusters Image analysis applications Simulation applications Clusters Desk top Machines Em bedded Dev ices Mobile 7% are deployed only on desktop Dev ices machines Data management applications
Survey Results [4/5] Dataset Granularity30 100 8020 60 4010 20 0 0 10 kB 100 kB 1 MB 10 MB 100 MB 1 GB 10 GB 100 GB 1 TB 10 kB 100 kB 1 MB 10 MB 100 MB 1 GB 10 GB 100 GB 1 TB Most common dataset size is 10 MB 23% of all datasets are ≤ 1 MB 12% of all datasets are 100 GB in size 50% of all datasets are ≤ 10 MB 25% of all datasets are ≥ 10 GB
Survey Results [5/5] Special Network Services 100% % of surveyed applicat ions 80% 60% Not Sure Unnecessary 40% Would Be Used Used 20% % Tran sfer Ad van ced Net work Delay Pre - Net work Top olog y d ict ion Reservat ion In form at ion
Traffic Behaviour [1/2] The results give an image of the traffic flow sizes that is different from common belief. We define five distinct classes of applications according to dataset sizes: Class A: less than 10 MB Class B: 0.5 – 100 MB Class C: 10 MB – 1 GB Class D: 100 kB – 100 GB Class E: 1 MB – 1 TB
Traffic Behaviour [2/2] E 20% The most common class is A, A where datasets are no larger 34% than 10 MB. Only 33% of all applications D 13% have datasets over 1 GB in size. Only 20% of all applications C B have datasets that stretch 13% 20% beyond 100 GB. All class C applications are deployed on mostly desktop machines. All class B applications are Astronomy and Meteorology applications, deployed over 100-300 nodes across 6-8 domains.
Future Work We intend to monitor the traffic created by a number of grid applications. We aim to present mathematical models of grid traffic that could be used to create artificial grid traffic (in simulators).
Conclusion We presented the outcome of a survey of grid application requirements and network behaviour. The results reflect a list of real demands of grid applications, which provides a solid starting point to the design of our interface. The suggested classification portrays the diversity in the traffic footprint of grid applications.