Yehia El-khatib, Christopher Edwards,       Michael Mackay, and Gareth TysonComputing Department, Lancaster University, UK...
MotivationObjectiveSystem Overview◦ The GridMAP Service◦ Monitoring DaemonNetwork Measurement EvaluationConclusionsFuture ...
Why measure                                              the network?Public networks are unpredictable.◦ Heterogeneous com...
Why measure                                         the network in                                             grids?Stand...
Our aim is to enable schedulers to makemore informed decisions on node selectionand resource allocation.There is a need fo...
MotivationObjectiveSystem Overview◦ The GridMAP Service◦ Monitoring DaemonNetwork Measurement EvaluationConclusionsFuture ...
GridMAP (Grid Monitoring, Analysis andPrediction) is a distributed system whichcollects network performance and resourceav...
It is a grid service; i.e. a WSRF Web Servicethat also conforms to the OGSI standard.It provides a set of standard interfa...
A daemon runs on each grid node to measureresource and network performance.These measurements are sent to the GridMAPservi...
Application      Run job xyz              requirements: delay, CPU, memoryScheduler                                       ...
Measurements are accessible via oneinterface, from one publisher.Behind the interface is a distributedapplication:◦ distri...
MotivationObjectiveSystem Overview  The GridMAP Service◦ Monitoring DaemonNetwork Measurement EvaluationConclusionsFuture ...
Why passive measurements?◦ Active techniques obligates the network to  accommodate artificial traffic probes in addition t...
Why is passive measurement relevant for gridsystems?◦ Grid nodes constantly exchange data sets, job  state, result sets, a...
Uses pcap to capture packet headers.◦ 3-way handshake is used to measure RTT.◦ As connections end, throughput is calculate...
MotivationObjectiveSystem Overview The GridMAP Service Monitoring DaemonNetwork Measurement EvaluationConclusionsFuture Work
Aim: to verify the accuracy of the obtainedmeasurements.Setup:◦ 5 connections of varying lengths.◦ Trigger 34 iperf probes...
Experiment 1: Ethernet connection                  1 hop                 ~0.57 ms
Experiment 2: Local DSL connection                   4 hops                   ~19 ms
Experiment 3: Lancaster → Oxford                   12 hops                    ~9 ms
Experiment 4: Lancaster → Munich            15 hops            ~29 ms
Experiment 5: Innsbruck→ Lancaster             17 hops             ~48 ms
Ethernet                             OxfordMunich                             Innsbruck           Note: During the DSL con...
Ethernet            DSL           Oxford
Munich                    InnsbruckOn average, our measurements were:◦ 1.55% away from the minimum ping values and  2.33% ...
MotivationObjectiveSystem Overview The GridMAP Service Monitoring DaemonNetwork Measurement EvaluationConclusionsFuture Work
Daemon works in an entirely passive fashion:◦ no disruption caused to real traffic◦ measurements cannot be mistaken for th...
Development of the GridMAP grid service isongoing.We will expand the range of metrics.◦ e.g. one-way delay variation is im...
Yehia El-khatib       yehia  Christopher Edwards          ce                                     @ comp.lancs.ac.uk      M...
2009.08 grid peer-slides
Upcoming SlideShare
Loading in …5
×

2009.08 grid peer-slides

539 views

Published on

Yehia El-khatib, Chris Edwards, Michael Mackay and Gareth Tyson. "Providing Grid Schedulers with Passive Network Measurements". In Proceedings of the 18th International IEEE Conference on Computer Communications and Networks: Workshop on Grid and P2P Systems and Applications (GridPeer 2009), San Francisco, CA, USA, August 2-6 2009.

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
539
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
2
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

2009.08 grid peer-slides

  1. 1. Yehia El-khatib, Christopher Edwards, Michael Mackay, and Gareth TysonComputing Department, Lancaster University, UK www.ec-gin.eu
  2. 2. MotivationObjectiveSystem Overview◦ The GridMAP Service◦ Monitoring DaemonNetwork Measurement EvaluationConclusionsFuture Work
  3. 3. Why measure the network?Public networks are unpredictable.◦ Heterogeneous components◦ Independent standards and protocols◦ IP provides best-effort, “one size fits all” delivery.Potential to hinder the performance of anynetworked application.IP networks do not readily provide feedbackabout their operational performance.Hence, numerous network monitoring tools:◦ management, troubleshooting, and/or pre- and post-deployment probation.
  4. 4. Why measure the network in grids?Stand-alone tools are ad hoc and manual.Grids are dynamic systems that aggregateresources to run very demanding applications.High performance is always expected andhence contention on resources is similarly high.Efficient scheduling is only possible if access tocorrect resource information is available.Current middlewares and Grid InformationSystems (GIS) are ineffective and cumbersome.◦ Information is insufficient and/or outdated◦ Needs to be gathered from different publishers
  5. 5. Our aim is to enable schedulers to makemore informed decisions on node selectionand resource allocation.There is a need for a means by which gridschedulers can obtain knowledge aboutchanges in the grid.◦ changes in the availability of remote computational resources◦ changes in the network path to those resourcesThis requires accurate end-to-endmeasurements to be provided to schedulers.
  6. 6. MotivationObjectiveSystem Overview◦ The GridMAP Service◦ Monitoring DaemonNetwork Measurement EvaluationConclusionsFuture Work
  7. 7. GridMAP (Grid Monitoring, Analysis andPrediction) is a distributed system whichcollects network performance and resourceavailability information.This information is used by GridMAP toprovide, analyze and predict performance andavailability.It is made up of: Grid Service Monitoring Daemon
  8. 8. It is a grid service; i.e. a WSRF Web Servicethat also conforms to the OGSI standard.It provides a set of standard interfaces thatallow convenient access for schedulers.The retrieved information can beincorporated by schedulers into job and dataallocation processes to automatically adapt toperceived and foreseeable resource andnetwork status.
  9. 9. A daemon runs on each grid node to measureresource and network performance.These measurements are sent to the GridMAPservice to be indexed and stored.
  10. 10. Application Run job xyz requirements: delay, CPU, memoryScheduler GridMAP Delay CPU Memory Node Average Predicted Average Predicted Average PredictedApplication Copy file requirements: delay, throughput, disk spaceScheduler GridMAP Delay Throughput Disk Space Node Average Predicted Average Predicted Average Predicted
  11. 11. Measurements are accessible via oneinterface, from one publisher.Behind the interface is a distributedapplication:◦ distributed repository automatic replication no single point-of-failure ensures resilience◦ makes it possible to afford the demanding costs of storing, indexing and analyzing the measurements
  12. 12. MotivationObjectiveSystem Overview The GridMAP Service◦ Monitoring DaemonNetwork Measurement EvaluationConclusionsFuture Work
  13. 13. Why passive measurements?◦ Active techniques obligates the network to accommodate artificial traffic probes in addition to real traffic, decreasing overall performance. e.g. TTCP, iperf, UDPmon◦ Passive techniques: arguably less accurate. e.g. Sting, Synack, IPTraf◦ ICMP messaging: It is not uncommon for ICMP to be disabled or treated differently than TCP traffic. e.g. ping, fping, traceroute Best of both worlds: to avoid added network overhead without compromising accuracy.
  14. 14. Why is passive measurement relevant for gridsystems?◦ Grid nodes constantly exchange data sets, job state, result sets, and control signals.◦ Most if not all grid traffic is TCP-based.We exploit such frequent TCP interactions toextract network metrics (RTT, throughput).This technique is not viable in systems otherthan grids, which is partly the reason whyother TCP-based passive techniques areusually supplemented with active probes.
  15. 15. Uses pcap to capture packet headers.◦ 3-way handshake is used to measure RTT.◦ As connections end, throughput is calculated.Metrics are calculated for each flow, andstored in a local cache.The daemon also measures availability oflocal resources (such as CPU, memory, etc.).On a regular basis, these ‘performancesnapshots’ are sent to the GridMAP service.
  16. 16. MotivationObjectiveSystem Overview The GridMAP Service Monitoring DaemonNetwork Measurement EvaluationConclusionsFuture Work
  17. 17. Aim: to verify the accuracy of the obtainedmeasurements.Setup:◦ 5 connections of varying lengths.◦ Trigger 34 iperf probes of different durations (1- 500 seconds).◦ Run the daemon on the sending node.◦ Compare results against those of ping and iperf.
  18. 18. Experiment 1: Ethernet connection 1 hop ~0.57 ms
  19. 19. Experiment 2: Local DSL connection 4 hops ~19 ms
  20. 20. Experiment 3: Lancaster → Oxford 12 hops ~9 ms
  21. 21. Experiment 4: Lancaster → Munich 15 hops ~29 ms
  22. 22. Experiment 5: Innsbruck→ Lancaster 17 hops ~48 ms
  23. 23. Ethernet OxfordMunich Innsbruck Note: During the DSL connection test, ping packets did not get through due to disabled ICMP messaging.
  24. 24. Ethernet DSL Oxford
  25. 25. Munich InnsbruckOn average, our measurements were:◦ 1.55% away from the minimum ping values and 2.33% away from the mean ping values◦ 2.20% away from the iperf measurements
  26. 26. MotivationObjectiveSystem Overview The GridMAP Service Monitoring DaemonNetwork Measurement EvaluationConclusionsFuture Work
  27. 27. Daemon works in an entirely passive fashion:◦ no disruption caused to real traffic◦ measurements cannot be mistaken for threats such as TCP-SYN floods or DoS attacksIndependent operation:◦ no need for peer coordination/synchronization◦ no reliance on router accounting schemes (e.g. IP accounting, NetFlow)Monitoring traffic becomes an automatic process.The technique is quite trivial, but provides apowerful viewpoint which results in measurementsthat directly reflect the experience of grid traffic.
  28. 28. Development of the GridMAP grid service isongoing.We will expand the range of metrics.◦ e.g. one-way delay variation is important to virtualization applicationsWe then plan to test our technique againstmore active and passive measurement tools.
  29. 29. Yehia El-khatib yehia Christopher Edwards ce @ comp.lancs.ac.uk Michael Mackay m.mackay Gareth Tyson g.tysonComputing Department, Lancaster University, Lancaster, LA1 4WA, United Kingdom www.ec-gin.eu

×