Your SlideShare is downloading. ×
Alfredo paganophd 3y
Alfredo paganophd 3y
Alfredo paganophd 3y
Alfredo paganophd 3y
Alfredo paganophd 3y
Alfredo paganophd 3y
Alfredo paganophd 3y
Alfredo paganophd 3y
Alfredo paganophd 3y
Alfredo paganophd 3y
Alfredo paganophd 3y
Alfredo paganophd 3y
Alfredo paganophd 3y
Alfredo paganophd 3y
Alfredo paganophd 3y
Alfredo paganophd 3y
Alfredo paganophd 3y
Alfredo paganophd 3y
Alfredo paganophd 3y
Alfredo paganophd 3y
Alfredo paganophd 3y
Alfredo paganophd 3y
Alfredo paganophd 3y
Alfredo paganophd 3y
Alfredo paganophd 3y
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Alfredo paganophd 3y

178

Published on

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
178
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
1
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Ferrara, Tuesday, January 27, 2015 Corso di Dottorato in Matematica e Informatica Università degli studi di Ferrara 2nd Year PhD Activities Report (2009) Alfredo Pagano Advisor: Prof. Eleonora Luppi Coadvisor: Dr. Mario Reale 1
  • 2. Ferrara, Tuesday, January 27, 2015 Alfredo Pagano 2 Present employment Sysadmin in charge of installing, supporting, and maintaining servers and core services at GARR The aim of Consortium GARR is to plan, manage, and operate the Italian National Research and Education Network, implementing the most advanced technical solutions and services. Networking Support Activity (EGEE-III SA2) Enabling Grids for E-sciencE (EGEE) is Europe's leading grid computing project, providing a computing support infrastructure for over 10,000 researchers world-wide, from fields as diverse as high energy physics, earth and life sciences.
  • 3. Ferrara, Tuesday, January 27, 2015 Alfredo Pagano 3 GARR-G: Backbone Physical Infrastructure GARR-G: PoP-level topology  About 45 GARR PoP  90% in Univ. research institutions premises  other in telco operators and Internet eXchange premises  About 60 backbone links  Mainly leased lines from 8 telco operators  Core links  10 Gbps (STM-64) and 2.5 Gbps (STM- 16)  Edge links  34 Mbps, 155 Mbps and 622 Mbps  Peering links  10 Gbps (STM-64 and 10GigabitEthernet), 2.5 Gbps (STM-16) and 1 Gbps (1 GigabitEthernet)  Backbone capacity 120 Gbps  Peering capacity 40 Gbps
  • 4. Ferrara, Tuesday, January 27, 2015 Alfredo Pagano 4 GARR-G: Backbone Logical Infrastructure  Basically each IP backbone link corresponds to a leased lines from different operators  GARR-G is an IP multivendor network  Juniper M320, M20 and M10i  Cisco 12000, 7500 and 7200 20 RT.FI1 RT.TO1 RT.MI2 GEANT TELIA 20 5 9 10 11 20 10 20 MIX GlobalX RC.GE1 RC.TS1 14 30 50 1 1 1 RT1.BO1 RT1.MI1 RT.MI3 18 1 RC.TN1 19 RT.BA1 10 RC.MTRC.SA RC.CB 111 1 1 1 RC.LE RC.FG RC.PZ Juniper M320 Juniper M20 Cisco 7500 Telecom Infracom Interoute Fastweb Wind D.F/ Peering 9 RT.CT1 RC.ME RC.CS 1 1 20 RC.PA1 20 RT.NA1 19 RC.PG RC.AQ RC.AN 111 RT.RM1 RC.CA RC.SS 1 1 RT.RM2 Level 3 20 1 RC.RM2 1 NaMeX RC.FRA 20 1 RT.PI1 RC.FI 19 4 RC.VE RT.PD1 1 25 20 Cisco 7200 1 1 50 RC.AQ1 100 1 2.5GB 2.5GB 2.5GB 1 + 2 GB 10 GB 2.5GB 10GB 2.5GB 155MB 2X 155 2.5GB 155MB 2.5GB 10GB 2.5GB 10 GB 2 X 10 GB 155 MB 155 MB 2.5GB 2.5GB 34MB 2.5GB 2.5GB 10 GB 155 MB 155 MB 155MB 155 M B 2X155 1 + 2.5 GB 1+2.5GB 10 + 1 GB 2.5GB 2X 155 2.5GB 1GB 2.5GB 155 MB 2X155MB 155MB 155 MB 2 X 155 MB 2.5GB 155MB 155MB 2.5GB 2.5GB 2 X 155 2 X 155 155MB 155MB RC.CA1 Juniper M10i 155 MB 1 RC.PV 2X155MB 1 Juniper M7i 1 155 MB RC.UR 20 RT.FI1 RT.TO1 RT.MI2 GEANT TELIA 20 5 9 10 11 20 10 20 MIX GlobalX RC.GE1 RC.TS1 14 30 50 1 1 1 RT1.BO1 RT1.MI1 RT.MI3 18 1 RC.TN1 19 RT.BA1 10 RC.MTRC.SA RC.CB 111 1 1 1 RC.LE RC.FG RC.PZ Juniper M320 Juniper M20 Cisco 7500 Telecom Infracom Interoute Fastweb Wind D.F/ Peering 9 RT.CT1 RC.ME RC.CS 1 1 20 RC.PA1 20 RT.NA1 19 RC.PG RC.AQ RC.AN 111 RT.RM1 RC.CA RC.SS 1 1 RT.RM2 Level 3 20 1 RC.RM2 1 NaMeX RC.FRA 20 1 RT.PI1 RC.FI 19 4 RC.VE RT.PD1 1 25 20 Cisco 7200 1 1 50 RC.AQ1 100 1 2.5GB 2.5GB 2.5GB 1 + 2 GB 10 GB 2.5GB 10GB 2.5GB 155MB 2X 155 2.5GB 155MB 2.5GB 10GB 2.5GB 10 GB 2 X 10 GB 155 MB 155 MB 2.5GB 2.5GB 34MB 2.5GB 2.5GB 10 GB 155 MB 155 MB 155MB 155 M B 2X155 1 + 2.5 GB 1+2.5GB 10 + 1 GB 2.5GB 2X 155 2.5GB 1GB 2.5GB 155 MB 2X155MB 155MB 155 MB 2 X 155 MB 2.5GB 155MB 155MB 2.5GB 2.5GB 2 X 155 2 X 155 155MB 155MB RC.CA1 Juniper M10i 155 MB 1 RC.PV 2X155MB 1 Juniper M7i 1 155 MB RC.UR
  • 5. Ferrara, Tuesday, January 27, 2015 Alfredo Pagano 5  PhD subject  Motivation, Mission and Scope  The idea: pros and cons  Metrics collected  First results  Next steps 5
  • 6. Ferrara, Tuesday, January 27, 2015 Alfredo Pagano 6 PhD subject  1st year activity - The research activity was focused on exploiting and testing network monitoring tools to gather relevant metrics related to end-to-end performances in Grid infrastructures.  2nd year activity – Design and create a first working version of a Grid Network monitoring tool based on Grid jobs
  • 7. Ferrara, Tuesday, January 27, 2015 Alfredo Pagano 7 Motivation (1/2)  Debugging networks for efficiency  an essential step for those wishing to run data intensive applications  Optimizing performances of Grid middleware and applications to make intelligent use of the network  adapting to changing network conditions  Supporting the Grid “utility computing” model  ensuring that the differing network performances required by particular Grid applications are provided, via measurable SLAs
  • 8. Ferrara, Tuesday, January 27, 2015 Alfredo Pagano 8 Motivation (2/2)  An help for Site and Grid operations  Help diagnose performance problems among sites  This transfer is slow, what’s broken? – the network, the server, the middleware…  I can’t see site X, has the network gone down or is it just a particular service or machine?  My application’s performance varies with time of day – is there a network bottleneck?  Help diagnose problems within sites  Most network problems, especially performance issues, are not backbone related, they are in the “last mile”  Help with planning and provisioning decisions  Is an SLA I’ve arranged being adhered to by my providers?  For Grid services and middleware  I want to increase the performance of file transfers between sites  I want to know which compute site is “closest” to my data to submit a job to it
  • 9. Ferrara, Tuesday, January 27, 2015 Alfredo Pagano 9 The idea: pros and cons Instead of installing a probe at each site, run a Grid Job!  Added value:  No installation needed at the sites  Monitoring system running on a proven system (the grid) & possibility to use grid services  Direct use of grid AuthN and AuthZ  Limits:  The job is not running with root privileges on the Worker Node (WN)  Some low level operations are not permitted  Heterogeneity of the WN environments (OS, 32/64 bits, etc.)  Ex: making the job download-and-run an external binary may be tricky (except if they are written in an OS independent programming language)  The system has to deal with the grid mechanism overhead (delays…)
  • 10. Ferrara, Tuesday, January 27, 2015 Alfredo Pagano 10 The system in action Site paris-urec-ipv6 UI Central monitoring server program (CMSP) Site A WN Job Site X WMS CE Site B WN Job CE Site C WN Job CE Job submission Socket connection Ready! Probe Request Request: RTT test to site A Request: RTT test to site A Request: BW test to site B Request: BW test to site B
  • 11. Ferrara, Tuesday, January 27, 2015 Alfredo Pagano 11 Some remarks  Chosen design is more efficient than starting a job for each probe (considering delays)  TCP connection is initiated by the job  No open port needed on the WN -> better for sites security  An authentication mechanism is implemented between the job and the server  High scalability (Bend and Fend can be easily decoupled)  A job cannot run forever (GlueCEPolicyMaxWallClockTime)  there are two jobs running at each site  A ‘main’ one  A ‘redundant’ one which is waiting and will become ‘main’ when the other one ends
  • 12. Ferrara, Tuesday, January 27, 2015 Alfredo Pagano 12 Round-Trip time, MTU and hop count tests Site paris-urec-ipv6 UI Central monitoring server program (CMSP) Site B WN Job Site C CE Socket connection Probe Request Request: RTT test to site C Request: RTT test to site C Probe Result
  • 13. Ferrara, Tuesday, January 27, 2015 Alfredo Pagano 13 Round-Trip time, MTU and hop count tests  The ‘RTT’ measure is the time a TCP ‘connect()’ function call takes:  Because a connect() call involves a round-trip of packets:  SYN ->  SYN-ACK <-  ACK ->  Results similar to the ones of ‘ping’  The MTU is given by the IP_MTU socket option  The number of hops is calculated in an iterative way  All these measures require:  To connect to an accessible port (1) on a machine of the remote site  To close the connection (no data is sent)  Note: This (connect/disconnect) is detected in the application log  (1): We use the port of the gatekeeper of the CE since it is known to be accessible (it is used by gLite) Round trip Just sending => no network delay
  • 14. Ferrara, Tuesday, January 27, 2015 Alfredo Pagano 14 WN-to-WN BW test (may be obsoleted) Site paris-urec-ipv6 UI Central monitoring server program (CMSP) Site A WN Job Site C WN Job Probe Request Request: BW test to wn-site-C:<p> Request: BW test to wn-site-C:<p> Request: Open a TCP port <p> Request: Open a TCP port <p> Socket connection Sending a big amount of data Probe Result
  • 15. Ferrara, Tuesday, January 27, 2015 Alfredo Pagano 15 GridFTP BW test Site paris-urec-ipv6 UI Central monitoring server program (CMSP) Site A WN Job Site C Probe Request Request: GridFTP BW test to site C Request: GridFTP BW test to site C Socket connection SE SE Replication of a big grid file Read the gridFTP log file Probe Result
  • 16. Ferrara, Tuesday, January 27, 2015 Alfredo Pagano 16 WN-to-WN BW test (under discussion)  It requires the remote site to allow incoming TCP connections to the WN  Not a best practice security policy  Not always possible (WNs behind a NAT)  Workaround are sometimes possible  WN WN transfers doesn’t reflect real use case  The WN network connectivity may not be adapted
  • 17. Ferrara, Tuesday, January 27, 2015 Alfredo Pagano 17 Metrics collected & scheduling  Latency test  Ping  Every 5 minutes  Hop list  Traceroute  Every 5 minutes  MTU size  Socket (IP_MTU socket option)  Every 5 minutes  Achievable Bandwidth  TCP throughput transfer via GridFTP transfer between 2 Storage Element  Every 8h
  • 18. Ferrara, Tuesday, January 27, 2015 Alfredo Pagano 18 8 Sites involved A. Paris Urec CNRS B. IN2P3 Lyon C. INFN-CNAF D. INFN-ROMA1 E. INFN-ROMA-CMS F. GRISU-ENEA-GRID G. INFN-BARI H. INFN-CATANIA
  • 19. Ferrara, Tuesday, January 27, 2015 Alfredo Pagano 19 Traceroute Paris-Catania  [pagano@ui-ipv6-testbed ~]$ traceroute grid005.ct.infn.it traceroute to grid005.ct.infn.it (193.206.208.18), 30 hops max, 38 byte packets 1 194.57.137.190 (194.57.137.190) 1.589 ms 1.479 ms 2.696 ms 2 r-interco-urec.reseau.jussieu.fr (134.157.247.38) 0.331 ms 0.273ms 0.348 ms 3 r-jusrap-reel.reseau.jussieu.fr (134.157.254.124) 0.368 ms 0.320ms 0.318 ms 4 interco-6.01-jussieu.rap.prd.fr (195.221.127.181) 0.258 ms 0.330ms 0.255 ms 5 * * * 6 te1-2-paris1-rtr-021.noc.renater.fr (193.51.189.230) 1.224 ms1.127 ms 1.121 ms MPLS Label=489 CoS=6 TTL=1 S=0 7 te0-0-0-3-paris1-rtr-001.noc.renater.fr (193.51.189.37) 1.182 ms1.298 ms 1.150 ms 12 rt1-mi1-rt-mi2.mi2.garr.net (193.206.134.190) 17.555 ms 17.533ms 17.646 ms 13 rt-mi2-rt-rm2.rm2.garr.net (193.206.134.230) 26.996 ms 27.071 ms 27.183 ms 14 rt-rm2-rt-rm1-l1.rm1.garr.net (193.206.134.117) 27.050 ms 27.160ms 27.062 ms 15 rt-rm1-rt-ct1.ct1.garr.net (193.206.134.6) 44.854 ms 44.882 ms 44.820 ms 16 rt-ct1-ru-infngrid.ct1.garr.net (193.206.137.186) 45.115 ms 45.013 ms 45.009 ms 17 grid005.ct.infn.it (193.206.208.18) 45.014 ms 44.967 ms 44.913 m 8 renater.rt1.par.fr.geant2.net (62.40.124.69) 1.154 ms 1.153 ms 1.128 ms 9 so-7-3-0.rt1.gen.ch.geant2.net (62.40.112.29) 9.950 ms 9.958 ms 10.018 ms 10 so-3-3-0.rt1.mil.it.geant2.net (62.40.112.210) 17.264 ms 17.314ms 17.372 ms 11 garr-gw.rt1.mil.it.geant2.net (62.40.124.130) 17.369 ms 21.514ms 17.368 ms
  • 20. Ferrara, Tuesday, January 27, 2015 Alfredo Pagano 20 Frontend view: Ldap Authentication, based on Google Web Toolkit (GWT) framework
  • 21. Ferrara, Tuesday, January 27, 2015 Alfredo Pagano 21 Next steps: 1. Triggering system to alert site and network admins 2. Frontend improvements (plotting graphs) 3. Not only scheduled, but also on-demand measurements …
  • 22. Ferrara, Tuesday, January 27, 2015 Alfredo Pagano 22 Thank you for your attention!
  • 23. Ferrara, Tuesday, January 27, 2015 Alfredo Pagano 23 Backup
  • 24. Ferrara, Tuesday, January 27, 2015 Alfredo Pagano 24 GridFTP BW test  This test shows good results  If the GridFTP log file is not accessible (cf. dCache?)  We just do the transfer via globus-url-copy and measure the time it takes  This is slightly less precise  How many streams should we request in the command line? globus-url-copy –p <num_streams> […]
  • 25. Ferrara, Tuesday, January 27, 2015 Alfredo Pagano 25 Network Performance Factors  End System Issues  Network Interface Card and Driver and their configuration  TCP and its configuration  Operating System and its configuration  Disk System  Processor speed  Bus speed and capability  Application eg old versions of scp  Network Infrastructure Issues  Obsolete network equipment  Configured bandwidth restrictions  Topology  Security restrictions (e.g., firewalls)  Sub-optimal routing  Transport Protocols  Network Capacity and the influence of Others!  Many, many TCP connections  Congestion

×